STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION Stabilized Music Aptitude: Onset, Transition, and Relative Constancy in Upper Elementary Students by Roberta L. Yee March 26, 2021 A dissertation submitted to the faculty of the Graduate School of the University at Buffalo, The State University of New York in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Learning and Instruction STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION Copyright by Roberta L. Yee 2021 All Rights Reserved ii STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION iii Acknowledgements I am indebted to Dr. Maria Runfola, my advisor and committee chair, for her inspiration, guidance, and exacting editorial skills. I offer my sincere thanks to Dr. Elisabeth Etopio and Dr. Sunha Kim for their feedback, suggestions, and support as members of my dissertation committee. Additional thanks to: • My daughters, Eliza and Sylvie, for their patience and understanding during this long PhD process: I’ve been distracted for seven years, but am looking forward to reconnecting with you, hopefully on an overseas trip. • My sister Denise, out-laws Dirk and Staci, and nieces Alyssa and Sophia for their love and encouragement: I am deeply appreciative. • My brother Wendell for his expertise in designing a data collection spreadsheet: you stepped up before I knew I was in over my head and created a thing of beauty. • My niece Julia for her many hours of data input: I am grateful for your accuracy and attention to detail. • The students of the Halifax Area School District for providing a window into their musical thinking. • My parents Bob and Jo Yee for their unconditional love, support, and belief in me: this is for you. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION iv Abstract The effect of chronological age and instruction on music aptitude, as well as the transition between the developmental and stabilized music aptitude stages, were examined to further establish the rationale for selection of the most appropriate music aptitude test for students in Grades 3, 4, and 5. Archived scores of the Intermediate Measures of Music Audiation (IMMA) were used in paired t-tests, Wilcoxon Signed Rank tests, and repeated measures ANOVA. No effect of chronological age or instruction was concluded, and a period of transition could not be substantiated definitively. It was conjectured tonal aptitude and rhythm aptitude stabilize independently of one another. However, as type of instruction may have had a deleterious effect on the findings for effect of instruction and substantiation of a transition period, further research is recommended. Keywords: stages of music aptitude, IMMA, chronological age, instruction, transition, developmental music aptitude, stabilized music aptitude STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION TABLE OF CONTENTS Acknowledgements ............................................................................................................ iii Abstract .............................................................................................................................. iv List of Tables ................................................................................................................... viii List of Figures .................................................................................................................. xvi Chapter 1: Introduction ........................................................................................................1 Theoretical Framework ............................................................................................2 Background of the Study .........................................................................................3 Need and Significance of the Study .........................................................................8 Purpose of the Study ................................................................................................9 Research Questions ................................................................................................10 Scope and Delimitations ........................................................................................10 Definition of Terms................................................................................................12 Chapter 2: Literature Review .............................................................................................13 Introduction ............................................................................................................13 Supplementary Features of Music Aptitude ..........................................................16 Stages of Music Aptitude .......................................................................................19 Brief History of Music Aptitude Testing ...............................................................21 Recent Music Measures .........................................................................................33 Critique of Previous Music Aptitude Measures .....................................................34 Critique of Gordon’s Music Aptitude Measures ....................................................39 Music Aptitude Measures Developed by Gordon ..................................................42 Features of Stabilized Music Aptitude ...................................................................60 v STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION Chapter 3: Methodology ....................................................................................................72 Research Questions and Research Hypotheses ......................................................72 Participants .............................................................................................................74 Missing Values.......................................................................................................77 Instrument ..............................................................................................................84 Procedure ...............................................................................................................85 Chapter 4: Presentation and Interpretation of Data............................................................94 Pattern Analysis of Missing Data ..........................................................................95 Imputation of Missing Values ..............................................................................100 Overview of Statistical Analyses .........................................................................101 Research Question 1 ............................................................................................102 Research Question 2 ............................................................................................120 Research Question 3 ............................................................................................175 Summary ..............................................................................................................238 Chapter 5: Discussion, Recommendation, and Conclusions ...........................................240 Purpose of the Study ............................................................................................240 Methodology ........................................................................................................242 Results ..................................................................................................................244 Discussion ............................................................................................................246 Limitations of the Study.......................................................................................258 Implications..........................................................................................................262 Recommendations ................................................................................................266 Adaptations to the Current Study .............................................................266 vi STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION Extensions to the Current Study ..............................................................269 Conclusions ..........................................................................................................286 References ........................................................................................................................291 vii STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION List of Tables Table 1. Statistical tests by grade level and academic year ...............................................89 Table 2. Three-year longitudinal examination of IMMA scores .......................................91 Table 3. Variable summary ................................................................................................96 Table 4. Descriptive statistics of complete and excluded case samples (pooled) ............105 Table 5. Correlation results of complete and excluded case samples (pooled) ...............106 Table 6. Paired samples t-test results – complete case sample (pooled) .........................107 Table 7. Paired samples t-test results – excluded case sample (pooled) ..........................107 Table 8. 3ST-4FT Descriptive statistics and correlation coefficient ...............................109 Table 9. 3ST-4FT Paired t-test results .............................................................................109 Table 10. 4ST-5FT Descriptive statistics and correlation coefficient .............................110 Table 11. 4ST-5FT Paired t-test results ...........................................................................111 Table 12. 3SR-4FR Descriptive statistics and correlation coefficient .............................113 Table 13. 3SR-4FR Paired t-test results ...........................................................................114 Table 14. 4SR-5FR Descriptive statistics and correlation coefficient .............................114 Table 15. 4SR-5FR Paired t-test results ...........................................................................115 Table 16. 3SC-4FC Descriptive statistics and correlation coefficient .............................117 Table 17. 3SC-4FC Paired t-test results ...........................................................................117 Table 18. 4SC-5FC Descriptive statistics and correlation coefficient .............................118 Table 19. 4SC-5FC Paired t-test results ...........................................................................118 Table 20. Wilcoxon Signed Rank test results (tonal) ......................................................122 Table 21. Wilcoxon Signed Rank test results (rhythm) ...................................................123 Table 22. Wilcoxon Signed Rank test results (composite) ..............................................124 viii STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION Table 23. 2007-2008 Grade 3 Descriptive Statistics (pooled) .........................................125 Table 24. 2007-2008 Grade 3 Correlation matrix (pooled) .............................................126 Table 25. 2007-2008 Grade 3 Shapiro-Wilk Test of Normality results ..........................127 Table 26. 2007-2008 Grade 3 Wilcoxon Signed Rank test results (pooled) ...................128 Table 27. 2007-2008 Grade 3 Wilcoxon Signed Rank test statistics (pooled) ................128 Table 28. 2008-2009 Grade 3 Descriptive statistics (pooled)..........................................130 Table 29. 2008-2009 Grade 3 Correlation matrix (pooled) .............................................131 Table 30. 2008-2009 Grade 3 Shapiro-Wilk Test of Normality results ..........................131 Table 31. 2008-2009 Grade 3 Wilcoxon Signed Rank test results (pooled) ...................132 Table 32. 2008-2009 Grade 3 Wilcoxon Signed Rank test statistics (pooled) ................133 Table 33. 2009-2010 Grade 3 Descriptive statistics (pooled)..........................................134 Table 34. 2009-2010 Grade 3 Correlation matrix (pooled) .............................................135 Table 35. 2009-2010 Grade 3 Shapiro-Wilk Test of Normality results ..........................135 Table 36. 2009-2010 Grade 3 Wilcoxon Signed Rank test results (pooled) ...................136 Table 37. 2009-2010 Grade 3 Wilcoxon Signed Rank test statistics (pooled) ................137 Table 38. 2010-2011 Grade 3 Descriptive statistics (pooled)..........................................138 Table 39. 2010-2011 Grade 3 Correlation matrix (pooled) .............................................138 Table 40. 2010-2011 Grade 3 Shapiro-Wilk Test of Normality results ..........................139 Table 41. 2010-2011 Grade 3 Wilcoxon Signed Rank test results (pooled) ...................140 Table 42. 2010-2011 Grade 3 Wilcoxon Signed Rank test statistics (pooled) ................140 Table 43. 2011-2012 Grade 3 Descriptive statistics (pooled)..........................................141 Table 44. 2011-2012 Grade 3 Correlation matrix (pooled) .............................................142 Table 45. 2011-2012 Grade 3 Shapiro-Wilk Test of Normality results ..........................143 ix STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION Table 46. 2011-2012 Grade 3 Wilcoxon Signed Rank test results (pooled) ...................144 Table 47. 2011-2012 Grade 3 Wilcoxon Signed Rank test statistics (pooled) ................145 Table 48. 2012-2013 Grade 3 Descriptive statistics (pooled)..........................................145 Table 49. 2012-2013 Grade 3 Correlation matrix (pooled) .............................................146 Table 50. 2012-2013 Grade 3 Shapiro-Wilk Test of Normality results ..........................146 Table 51. 2012-2013 Grade 3 Wilcoxon Signed Rank test results (pooled) ...................148 Table 52. 2012-2013 Grade 3 Wilcoxon Signed Rank test statistics (pooled) ................149 Table 53. 2013-2014 Grade 3 Descriptive statistics (pooled)..........................................149 Table 54. 2013-2014 Grade 3 Correlation matrix (pooled) .............................................149 Table 55. 2013-2014 Grade 3 Shapiro-Wilk Test of Normality results ..........................150 Table 56. 2013-2014 Grade 3 Wilcoxon Signed Rank test results (pooled) ...................151 Table 57. 2013-2014 Grade 3 Wilcoxon Signed Rank test statistics (pooled) ................152 Table 58. 2014-2015 Grade 3 Descriptive statistics (pooled)..........................................152 Table 59. 2014-2015 Grade 3 Correlation matrix (pooled) .............................................153 Table 60. 2014-2015 Grade 3 Shapiro-Wilk Test of Normality results .........................153 Table 61. 2014-2015 Grade 3 Wilcoxon Signed Rank test results (pooled) ...................155 Table 62. 2014-2015 Grade 3 Wilcoxon Signed Rank test statistics (pooled) ................156 Table 63. 2015-2016 Grade 3 Descriptive statistics (pooled)..........................................156 Table 64. 2015-2016 Grade 3 Correlation matrix (pooled) .............................................157 Table 65. 2015-2016 Grade 3 Shapiro-Wilk Test of Normality results ..........................157 Table 66. 2015-2016 Grade 3 Wilcoxon Signed Rank test results (pooled) ...................158 Table 67. 2015-2016 Grade 3 Wilcoxon Signed Rank test statistics (pooled) ................159 Table 68. 2016-2017 Grade 3 Descriptive statistics (pooled)..........................................159 x STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION Table 69. 2016-2017 Grade 3 Correlation matrix (pooled) .............................................160 Table 70. 2016-2017 Grade 3 Shapiro-Wilk Test of Normality results .........................160 Table 71. 2016-2017 Grade 3 Wilcoxon Signed Rank test results (pooled) ...................162 Table 72. 2016-2017 Grade 3 Wilcoxon Signed Rank test statistics (pooled) ................162 Table 73. 2017-2018 Grade 3 Descriptive statistics (pooled)..........................................163 Table 74. 2017-2018 Grade 3 Correlation matrix (pooled) .............................................164 Table 75. 2017-2018 Grade 3 Shapiro-Wilk Test of Normality results ..........................164 Table 76. 2017-2018 Grade 3 Paired t-test results (pooled) ............................................165 Table 77. 2018-2019 Grade 3 Descriptive statistics (pooled)..........................................165 Table 78. 2018-2019 Grade 3 Correlation matrix (pooled) .............................................166 Table 79. 2018-2019 Grade 3 Shapiro-Wilk Test of Normality results ..........................167 Table 80. 2018-2019 Grade 3 Wilcoxon Signed Rank test results (pooled) ...................168 Table 81. 2018-2019 Grade 3 Wilcoxon Signed Rank test statistics (pooled) ................169 Table 82. 2019-2020 Grade 3 Descriptive statistics (pooled)..........................................170 Table 83. 2019-2020 Grade 3 Correlation matrix (pooled) .............................................170 Table 84. 2019-2020 Grade 3 Shapiro-Wilk Test of Normality results ..........................171 Table 85. 2019-2020 Grade 3 Wilcoxon Signed Rank test results (pooled) ...................172 Table 86. 2019-2020 Grade 3 Wilcoxon Signed Rank test statistics (pooled) ................172 Table 87. Repeated Measures ANOVA combined results (tonal) ...................................176 Table 88. Repeated Measures ANOVA combined results (rhythm) ...............................177 Table 89. Repeated Measures ANOVA combined results (composite) ..........................179 Table 90. Group A: Descriptive statistics pooled results (tonal) .....................................180 Table 91. Group A: Mauchly’s Test of Sphericity results (tonal) ...................................181 xi STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION Table 92. Group A. Tests of Within Subjects Effects results (tonal) ..............................181 Table 93. Group A: Multivariate test results (tonal) ........................................................182 Table 94. Group A: Pairwise Comparisons pooled results (tonal) ..................................182 Table 95. Group A: Descriptive Statistics pooled results (rhythm) .................................183 Table 96. Group A. Mauchly’s Test of Sphericity results (rhythm) ................................184 Table 97: Group A: Tests of Within Subjects Effects results (rhythm) ...........................184 Table 98: Group A: Multivariate test results (rhythm) ....................................................185 Table 99. Group A: Descriptive Statistics pooled results (composite) ............................186 Table 100. Group A: Mauchly’s Test of Sphericity results (composite) .........................186 Table 101. Group A: Tests of Within-Subjects Effects results (composite) ....................187 Table 102. Group A: Multivariate test results (composite) .............................................187 Table 103. Group A: Pairwise Comparisons pooled results (composite) ........................188 Table 104. Group B: Descriptive Statistics pooled results (tonal)...................................189 Table 105. Group B: Mauchly’s Test of Sphericity results (tonal)..................................189 Table 106. Group B: Tests of Within-Subjects Effects results (tonal) ............................190 Table 107. Group B: Multivariate test results (tonal) ......................................................190 Table 108. Group B: Pairwise Comparisons pooled results (tonal).................................191 Table 109. Group B: Descriptive Statistics pooled results (rhythm) ...............................192 Table 110. Group B: Mauchly’s Test of Sphericity results (rhythm) ..............................192 Table 111. Group B: Tests of Within-Subjects Effects results (rhythm) .........................193 Table 112. Group B: Multivariate test results (rhythm) ..................................................193 Table 113. Group B: Pairwise Comparison pooled results (rhythm)...............................194 Table 114. Group B: Descriptive Statistics pooled results (composite) ..........................195 xii STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION Table 115. Group B: Mauchly’s Test of Sphericity results (composite) .........................196 Table 116. Group B: Tests of Within-Subjects Effects results (composite) ....................196 Table 117. Group B: Multivariate test results (composite)..............................................197 Table 118. Group B: Pairwise Comparisons pooled results (composite) ........................198 Table 119. Group C: Descriptive Statistics pooled results (tonal)...................................199 Table 120. Group C: Mauchly’s Test of Sphericity results (tonal)..................................199 Table 121. Group C: Tests of Within-Subjects Effects results (tonal) ............................200 Table 122. Group C: Multivariate test results (tonal) ......................................................200 Table 123. Group C: Pairwise Comparisons pooled results (tonal).................................201 Table 124. Group C: Descriptive Statistics pooled results (rhythm) ...............................202 Table 125. Group C: Mauchly’s Test of Sphericity results (rhythm) ..............................202 Table 126. Group C: Tests of Within-Subjects Effects results (rhythm) .........................203 Table 127. Group C: Multivariate test results (rhythm) ..................................................203 Table 128. Group C: Descriptive Statistics pooled results (composite) ..........................205 Table 129. Group C: Mauchly’s Test of Sphericity results (composite) .........................205 Table 130. Group C: Tests of Within-Subjects Effects results (composite) ....................206 Table 131. Group C: Multivariate test results (composite)..............................................206 Table 132. Group C: Pairwise Comparisons pooled results (composite) ........................207 Table 133. Group D: Descriptive Statistics pooled results (tonal) ..................................208 Table 134. Group D: Mauchly’s Test of Sphericity results (tonal) .................................208 Table 135. Group D: Tests of Within-Subjects Effects results (tonal) ............................209 Table 136. Group D: Multivariate test results (tonal) ......................................................209 Table 137. Group D: Pairwise Comparisons pooled results (tonal) ................................210 xiii STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION Table 138. Group D: Descriptive Statistics pooled results (rhythm) ...............................211 Table 139. Group D: Mauchly’s Test of Sphericity results (rhythm) ..............................211 Table 140. Group D: Tests of Within-Subjects Effects results (rhythm) ........................212 Table 141. Group D: Multivariate test results (rhythm) ..................................................212 Table 142. Group D: Descriptive Statistics pooled results (composite) ..........................214 Table 143. Group D: Mauchly’s Test Sphericity results (composite) .............................214 Table 144. Group D: Tests of Within-Subjects Effects results (composite) ....................215 Table 145. Group D: Multivariate test results (composite) .............................................216 Table 146. Group D: Pairwise Comparisons pooled results (composite) ........................217 Table 147. Group E: Descriptive Statistics pooled results (tonal) ...................................217 Table 148. Group E: Mauchly’s Test of Sphericity results (tonal) ..................................218 Table 149. Group E: Tests of Within-Subjects Effects results (tonal) ............................218 Table 150. Group E: Multivariate test results (tonal) ......................................................219 Table 151. Group E: Descriptive Statistics pooled results (rhythm) ...............................219 Table 152. Group E: Mauchly’s Test of Sphericity results (rhythm) ..............................220 Table 153. Group E: Tests of Within-Subjects Effects results (rhythm) .........................220 Table 154. Group E: Multivariate test results (rhythm) ...................................................221 Table 155. Group E: Pairwise Comparison pooled results (rhythm) ...............................221 Table 156. Group E: Descriptive Statistics pooled results (composite) ..........................223 Table 157. Group E: Mauchly’s Test of Sphericity results (composite) .........................223 Table 158. Group E: Tests of Within-Subjects Effects results (composite) ....................224 Table 159. Group E: Multivariate test results (composite) ..............................................225 Table 160. Group E: Pairwise Comparisons pooled results (composite) ........................226 xiv STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION Table 161. Group F: Descriptive Statistics pooled results (tonal) ...................................226 Table 162. Group F: Mauchly’s Test of Sphericity results (tonal) ..................................227 Table 163. Group F: Tests of Within-Subjects Effects results (tonal) .............................227 Table 164. Group F: Multivariate test results (tonal) ......................................................228 Table 165. Group F: Descriptive Statistics pooled results (rhythm) ...............................228 Table 166. Group F: Mauchly’s Test of Sphericity results (rhythm) ..............................229 Table 167. Group F: Tests of Within-Subjects Effects results (rhythm) .........................229 Table 168. Group F: Multivariate test results (rhythm) ...................................................230 Table 169. Group F: Descriptive Statistics pooled results (composite)...........................230 Table 170. Group F: Mauchly’s Test of Sphericity results (composite) ..........................231 Table 171. Group F: Tests of Within-Subjects Effects results (composite) ....................232 Table 172. Group F: Multivariate test results (composite) ..............................................233 xv STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION List of Figures Figure 1. PMMA/IMMA test answer sheet design ............................................................50 Figure 2. Research procedure ............................................................................................93 Figure 3. Overall summary of missing values ...................................................................95 Figure 4. Missing value patterns ........................................................................................97 Figure 5. Missing value patterns bar graph ........................................................................98 Figure 6. Paired t-test Spring-Fall results (tonal).............................................................112 Figure 7. Paired t-test Spring-Fall results (rhythm) .........................................................115 Figure 8. Paired t-test Spring-Fall results (composite) ....................................................119 Figure 9. Wilcoxon Signed Rank test results (tonal) .......................................................173 Figure 10. Wilcoxon Signed Rank test results (rhythm)..................................................174 Figure 11. Wilcoxon Signed Rank test results (composite) .............................................175 Figure 12. Repeated measures ANOVA results (tonal) ...................................................234 Figure 13. Repeated measures ANOVA results (rhythm) ...............................................235 Figure 14. Repeated measures ANOVA results (composite) ..........................................237 Figure 15. Adaptations and Extensions to the current study ...........................................266 xvi STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 1 Chapter 1 Introduction Adapting instruction to support individual differences is a hallmark of good teaching, yet planning and implementing differentiated instruction is often an afterthought for music educators. It is general consensus of educators that individualized instruction benefits all students (Heathers, 1977). Similarly, individualized instruction in the music classroom is favorable, as it can enable students to achieve mastery if the task is appropriate. Salvador (2011) asserted students of varying levels of music aptitude may benefit from differentiated instruction. Scores from a valid music aptitude test can help diagnose each student’s musical strengths and weaknesses (Gordon, 2006), thus providing the necessary data by which teachers can determine the appropriateness of tasks for each student. A variety of music aptitude tests were developed in the early- to mid-1900s, differing in their content and intent. Test authors in the gestalt camp such as Wing and Drake (Gordon, 1987) believed each student possessed a global music aptitude, best expressed as a single composite test score combining all dimensions. Others, exemplified by Seashore, Kwalwasser, and Dykema, favored an atomistic approach to music aptitude (Gordon, 1987) in which separate aptitude scores from several subtests were reported. Differences in production of the recorded test prompts (natural sound production versus sound produced with musical instruments), the musical context of the test prompts (isolated pitches versus tonal patterns), the task required of the test taker (pitch discrimination versus counting of pitches), and the focus of the subtests to measure sensitivity to musical expression (preference versus non-preference) exemplified disparity in the function of available music aptitude tests. In addition, terms such as “ability”, “talent”, “achievement”, “intelligence, and “aptitude” were used interchangeably and STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 2 indiscriminately (Boyle, 1992), thus confounding the construct of music aptitude test developers sought to measure. Theoretical Framework The theoretical framework of this study was based on the extensive research of Edwin E. Gordon on the construct and measurement of music aptitude, and data will be interpreted in this study through the lens of music aptitude. From the results of numerous studies conducted by Gordon, his doctoral students, and other music educators, Gordon (1987) theorized an omnibus conceptualization of music aptitude as innate but not inherent, multidimensional, and developmental (influenced by the music environment) until approximately age 9 (p. 9). Gordon designed five music aptitude tests to measure developmental and stabilized music aptitude for a range of student ages; norms were reported for students from preschool through college. Gordon devised the music aptitude tests Audie and the Primary Measures of Music Audiation (PMMA) to measure developmental music aptitude and the Advanced Measures of Music Audiation (AMMA) and the Musical Aptitude Profile (MAP) to measure stabilized music aptitude. By reviewing test scores, teachers were able not only to diagnose students’ musical strengths and weaknesses but to use their knowledge of students’ musical strengths and weaknesses to individualize instruction. Moreover, Gordon (2006) asserted the Intermediate Measures of Music Audiation (IMMA) could be used to measure developmental music aptitude for students in Grades 1–3 and stabilized music aptitude in students in Grades 4–6. Norms were published for students in Grades K–3 (PMMA; Gordon, 1986c), Grades 1–6 (IMMA; Gordon, 1986c), Grades 4–12 (MAP; Gordon, 1995), and students in junior high, high school, and college (AMMA; Gordon, 1989b). The overlap of normed grade levels between music aptitude tests and IMMA’s ability to measure both developmental and stabilized music aptitude, depending on students’ STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 3 chronological age, might result in uncertainty on the part of music teachers about selection of the music aptitude test most appropriate and efficacious for use with their students. Background of the Study Investigations Using Music Aptitude Measures Research on the design and effectiveness of music aptitude testing had been conducted since the early 1900s. Numerous researchers incorporated Gordon’s music aptitude measures in their studies designed to investigate varied topics such as improvisation (Amchin, 1995; Azzara, 1992; Bash, 1983; Briscuso, 1972; Ciorba, 2006; Della Pietra, 1997; Josuweit, 1991; Karas, 2005; Kołodziejski, 2019; Rowlyk, 2008; Stringham, 2010; Westervelt, 2001), composition (Auh, 1995; Crawford, 2016; Guderian, 2008; Henry, 1995; Menard, 2009; Smith, 2004; Stoltzfus, 2005), vocal music achievement (Conkling, 1994; Guerrini, 2002; Kimble, 1983; McDowell, 1974; Miceli, 1998; Pereira et al., 2017; Rutkowski, 2015), instrumental music achievement (Arms Gilbert, 1997; Baer, 1987; Belczyk, 1992; Bergonzi, 1991; Bernhard, 2003; Brokaw, 1983; Choi, 1996; Cribari, 2014; Dell, 2003; Edmund, 2009; Frierson-Campbell, 2001; Gouzouasis, 1990; Kendall, 1986; Klinedinst, 1989; Lee, 2007; Linklater, 1994; Liperote, 2004; Milford, 2002; O’Leary, 2010; Ruthsatz, 2000), music reading (Bluestine, 2007; Ciepluch, 1988; Jarvis, 1981; Karas, 2005; Kluth, 1986; McDonald, 2010; Milford, 2002; Palmer, 1974; Parks, 2005; Reifinger, 2018), and types of instruction for students of elementary through high school age (Carroll, 1983; Cary, 1981; Clark, 2005; Davis, 1981; Etzel, 1979; Froseth, 1968; Gamble, 1989; Green, 2003; Groeling, 1975; Grutzmacher, 1985; Hansen, 1991; Haston, 2004; Hasty, 1992; Morgan, 1995; Ortner, 1990; Pursell, 2005; Smith, 2006). Thus, empirical evidence exists to support Gordon’s conceptualization of music aptitude and audiation (Culp, 2017) and its use in research. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 4 Measurement of Music Aptitude Gordon characterized developmental music aptitude as subject to the influence of musical environment and prone to fluctuation until approximately age 9 (Gordon, 1986c, pp. 8–9). In contrast, stabilized music aptitude was characterized as relatively immune to the effects of training and instruction, ceasing to demonstrate marked change after age 9 (Gordon, 1987, p. 9). The research of DeYarman (1972), Harrington (1969), Schleuter and DeYarman (1977), Seashore (1919), Stevens (1987), and Wing (1939/1961) focused on measurement of music aptitude for elementary-aged children from kindergarten through age 9, and sought the approximate age after which influence of training and instruction no longer affected student scores. The cessation of environmental influence thus indicated a shift from developmental music aptitude to stabilized music aptitude; this interpretation shaped the design of research investigating the construct and measurement of music aptitude. Reports of questionable test reliability aside (Geissel, 1985, pp. 1–2), it would be imprudent to draw conclusions based on the findings of these studies, as the nature and description of music aptitude measured in each (gestalt, atomistic, or omnibus) was dissimilar, as previously noted. However, two studies featured a unified approach to the construct of music aptitude in their examination of music aptitude testing. Geissel (1985) conducted an investigation of the comparative validities of PMMA, IMMA, and MAP with fourth grade students and concluded PMMA was too easy for fourth grade students, but IMMA and MAP were valid measures of stabilized music aptitude for students who possess high music aptitude (p. 32). Gordon (1986a) later published a factor analysis of the same three measures and concluded the existence of developmental and stabilized music aptitude from the results. Additional researchers concluded the attainment of the stabilized music aptitude stage from the findings of no significant STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 5 difference from pre- to post-training music aptitude test scores; however, these studies generally had other research foci, and their conclusions about stabilized music aptitude onset were tangential (DeYarman, 1972; Gordon, 1989c; Schleuter & DeYarman, 1977). Thus, few researchers have focused their work directly on identification of the age and nature of onset of the stabilized music aptitude stage and the appropriateness of Gordon’s music aptitude measures for students in upper elementary grades. The identification of the age of onset of stabilized music aptitude is further complicated by the overlap of recommended grade levels in Gordon’s music aptitude tests. Geissel (1985) noted selection of MAP or IMMA test batteries for students aged 9 is obscured by overlapping norms for Grade 4 students (p. 3). Onset of stabilized music aptitude was commonly postulated as occurring at age 9 or 10 (Deutsch, 1982; Gordon, 1971 and 2005; Mang, 2013; Phillips et al., 2002; Stevens, 1987) and had been characterized as “resistant to instruction” (DeYarman, 1975; Gordon, 1986a, 1989b; Haroutounian, 2002; Mang, 2013; Moore, 1987). Gordon (1986c) asserted the same test might be used to measure developmental music aptitude and recently stabilized music aptitude of students in Grade 4 if the design and content conformed to research specifications (p. 27), and concluded from a correlational study of IMMA and MAP that IMMA also might function as a test of stabilized music aptitude (Bolton, 1995). Walters (1991) concurred, but noted the superior diagnostic capabilities of MAP due to inclusion of sensitivity constructs, as compared to IMMA. However, Geissel’s (1985) findings indicated IMMA and PMMA (tests of developmental music aptitude) had more in common with each other than either had with MAP (a test of stabilized music aptitude) (p. 31). Schleuter and DeYarman (1977) asserted from their findings that formal music instruction did not affect the music aptitude levels of students in kindergarten through fourth STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 6 grade, and thus concluded music aptitude stabilized before age five or six. These results were similar to those found by DeYarman (1972) in a previous study. Nevertheless, there appeared to be little empirical evidence to demarcate the shift from developmental to stabilized music aptitude stages. The rationale for selection of music aptitude tests for upper elementary students was therefore obscured, as IMMA and MAP both served as tests of stabilized music aptitude for upper elementary students. IMMA was considered a more advanced and discriminating version of PMMA due to item difficulty of content (Walters, 1991). Bolton (1995) asserted IMMA was more suitable for older children who were likely to have been acculturated to music in many different forms and to have formed more sophisticated means of categorizing tonal and rhythm sounds (p. 31). PMMA or IMMA can be administered to students in Grades 1, 2, and 3 because both measure developmental music aptitude; Gordon (1986c) recommended administration of IMMA for classes in which at least half the students score above the 80th percentile on some or all of PMMA (p. 27). Phillips and Aitchison (1997) adhered to Gordon’s recommendation and administered PMMA to their sample of third grade students because Gordon’s published criterion for use of IMMA had not been met. A teacher therefore must estimate or calculate students’ PMMA test scores in order to determine if the administration of IMMA would be more appropriate, a Catch-22. Gordon (2001a) noted evaluation of developmental music aptitude change through comparison of PMMA and IMMA scores or percentile ranks was ill-advised, and instead proposed comparison of scores from administrations of the same test. Transition Between Music Aptitude Stages Gordon (1989b) later addressed a transitional stage of music aptitude, and promoted IMMA administration for students transitioning from the developmental to stabilized music STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 7 aptitude stage (ages 6–9) or those who have attained the stabilized music aptitude stage (age 10– 11). Moreover, Gordon (2006) speculated that middle school might serve as the period of a pronounced borderline between developmental and stabilized music aptitude stages, and MAP is more appropriate for students just entering the stabilized stage and AMMA for students who have gone beyond middle-ground and already settled into that stage (p. 234). In a related examination of developmental music aptitude, Gordon (1980b) found an uneven pattern of growth of tonal and rhythm developmental music aptitudes of Grade 7 students from a large city. This information did not correspond with what was previously understood about the consistency of stabilized music aptitude levels in culturally homogeneous students. Gordon recommended further analysis of the nature of the transition from developmental to stabilized music aptitude. Thus, an investigation of the transitional stage between developmental and stabilized music aptitude in the current study might reveal information helpful to teachers in determining the appropriate measure of music aptitude to use with their students. Longitudinal Constancy of Music Aptitude Constancy of stabilized music aptitude over time has been expressed as a function of relative standing. The effect of training on IMMA scores was an ancillary focus of Gordon’s (1989c) examination of predictive validity of the Instrument Timbre Preference Test (ITPT) and IMMA. After one year of instruction, IMMA scores of the experimental and control groups were compared to their pre-instruction IMMA scores and a lack of acute difference found. In his longitudinal predictive validity study of MAP, Gordon (2001c) examined the correlation coefficients of MAP scores of successive years to determine students’ relative standing and concluded that, in spite of an extended 3-year period of instrumental instruction, student MAP STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 8 scores displayed only typical increases and students maintained their relative standing on the test in relation to published MAP norms. Similar results were revealed from Gordon’s 1970 study of music aptitude differences in beginning instrumental students: students maintained their relative standing on MAP after two years of instruction. Many studies have been conducted on music aptitude (Froseth, 1971; Fullen, 1993; Gordon, 1970, 1999; Gromko & Walters, 1999; Guerrini, 2002; Hornbach & Taggart, 2005; Jaffurs, 2000; Mota, 1997; Phillips et al., 2002; Rutkowski, 1986, 1996, 2015; Rutkowski & Miller, 2003a; Schleuter, 1978), academic achievement (Gordon, 2001c; Hufstader, 1974; Johnson, 2000; Klinedinst, 1991; McCarthy, 1974; Mitchum, 1969; Moore, 1987), and general intelligence (Gordon, 2006; Norton, 1980) as predictors of music achievement, as well as studies on music achievement, academic achievement, and general intelligence as predictors of music aptitude (Carson, 1998; Hobbs, 1985; Johnson, 2000; Kuhlman, 2005; Simmons, 1981; Webb, 1984; Young, 1971). Further insight into the effect of chronological age and instruction on music aptitude, as well as the shift between the developmental and stabilized music aptitude stages, may help establish the rationale for selection of the most appropriate and efficacious test for a particular group of students. Need and Significance of the Study Gordon examined previous music aptitude tests by Seashore, Wing, and others, and concluded they differed in content and intent. Because there was no consensus on the definition of the construct of music aptitude or the terms used, the results of the tests created to measure music aptitude yielded differing interpretations. Thus, the need for this study is established by its unique contribution: an examination of the onset of and transition to the stabilized music aptitude stage based on a unified approach to music aptitude. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 9 Practical applications of the current study’s findings will inform music educators, music education researchers, and the field of music education in general. Teachers need detailed and objective data from which to identify students’ learning needs (Salvador, 2011). Music aptitude scores can fulfill that need: if the music aptitude measure selected is appropriate for the specific group of students, scores should represent each student’s current level of tonal and rhythm audiation and by extension, each student’s musical strengths and weaknesses. Music educators may then individualize and adapt instruction based on this data to maximize learning of students of all age levels and in all areas of concentration. The significance of this study is established through its contribution to the knowledge base of the nature and measurement of stabilized music aptitude. Not every research question is answered definitively; instead, additional questions may be posed. This would not exemplify a failed study, but rather an opportunity to extend current understandings by focusing future research questions. Regardless of the specific findings, it is hoped music education researchers and, by extension, the field of music education, may benefit from insight on when and how music aptitude stabilizes that may be gained from the current study. Purpose of the Study The purpose of this study was to investigate the onset of, transition to, and longitudinal constancy of stabilized music aptitude in upper elementary school students. Several published music aptitude tests were available for administration to children from preschool age through college and beyond. These tests had been researched thoroughly and found reliable and valid for their intended use (Gordon, 1984a, 1986a, 1986b, 1989b, 1990c, 2001c). However, there was overlap between the grade levels recommended for certain music aptitude tests, which could be perplexing for music educators in their selection of the appropriate test for their students. IMMA STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 10 scores of a stable sample of upper elementary students were explored longitudinally in this study in order to identify the appropriate music aptitude test for students in the intermediate grades, based on the stage of music aptitude likely occurring. Research Questions The research questions used to guide this study are: (1) At what grade level does chronological age cease to affect student music aptitude? (2) At what grade level does instruction cease to affect student music aptitude? (3) Is there evidence to substantiate the transition between the developmental music aptitude stage and stabilized music aptitude stage at approximately age 9/Grade 4? Scope and Delimitations The scope of this study was restricted to an examination of stabilized music aptitude: its progression from the developmental music aptitude stage, onset, and longitudinal constancy. Therefore, a delimitation of this study was the review of relevant research directly related to the construct and measurement of stabilized music aptitude. Students older than age 9 had been observed in the stabilized music aptitude stage (Gordon, 1989b, 1995); however, the existence of or length of transition period from the developmental to stabilized music aptitude stage was unclear. Therefore in this study, the music aptitude test scores of students in elementary school, specifically in Grades 3–5, were examined. Students in these grades had been taught by the researcher and thus were administered IMMA routinely as a means of tracking music aptitude levels for individualizing instruction; these students’ past scores comprised a convenience sample. This sample of students was taken from a district population with very little transiency: the great majority of students who graduated from this school district were elementary music students of the researcher (approximately 90%) and all IMMA scores from Grades 3–5 had been STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 11 preserved, thus allowing a longitudinal view of Grade 3 IMMA scores over a 13-year period. In addition, students’ scores over the 3-year period from Grades 3–5 were examined for an additional longitudinal perspective. The focus of this study was limited to elementary students in Grades 3–5, as findings of extant literature had established the onset of stabilized music aptitude to have occurred around age 9; students of that age are often in Grade 4. Although the researcher was a full-time elementary general music teacher who administered PMMA and IMMA to all students in Grades 1–5, only IMMA scores of students in Grade 3–5 from the elementary population of a small rural public elementary school were considered in this study. Scores of students with autism who received support in a self-contained classroom were included in this study. However, due to limits of students’ disabilities which may have affected their ability to comprehend the directions, make timely decisions, and select and mark their answers, music aptitude tests were administered to these students individually, the interval between test prompts was extended, and paraprofessionals may have scribed student answers to more accurately document intended student responses. The choice to frame this study within Gordon’s construct of music aptitude was an additional delimitation, made because evidence of IMMA’s use as a valid measure of both developmental music aptitude and stabilized music aptitude for students in the intermediate grades had been found in extant literature. It would be inappropriate to compare scores of different music aptitude tests, even when those tests were similar in design, such as PMMA and IMMA (Gordon, 1986c, pp. 66–67). Therefore, the use of scores of a single test (IMMA) to examine the onset of stabilized music aptitude and the transition between stages of music aptitude was advantageous. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 12 Definition of Terms Audiation: “the ability to hear and to give meaning to music when the sound is not physically present or may never have been physically present” (Gordon, 2005, p. 11, emphasis in original); the potential to achieve in music. Compensatory growth: as a result of appropriate instruction, students’ musical needs are mitigated: students receive a higher PMMA or IMMA raw score in the same grade or a higher percentile rank in the next grade (Gordon, 1986c, p. 68). Complementary instruction: as a result of appropriate instruction, students’ musical needs are met: students’ percentile rank remains essentially the same after re-administration of PMMA or IMMA (Gordon, 1986c, p. 76). Developmental music aptitude: music aptitude that fluctuates due to influence of environmental factors. Gordon (1987, p. 9) contended children remain in the developmental music aptitude stage until approximately age 9. Music achievement: “a measure of what a student has already learned in music” (Gordon, 1998, p. 2) and “based primarily in the brain” (Gordon, 2010, p. 211). Music aptitude: “a measure of a student’s potential to learn music” (Gordon, 1998, p. 5), best understood through examination of scores from a valid test of music aptitude (Gordon, 1987, p. 2). More than 20 different music aptitudes have been identified through factor analysis (Gordon, 1998, p. 11). Stabilized music aptitude: music aptitude that is unaffected by musical environment, training, or practice, realized as maintenance of relative standing on music aptitude tests (Gordon, 1980b, p. 25); students progress to the stabilized music aptitude stage at approximately age 9. “Musical expression is indicative of stabilized music aptitude” (Gordon, 1980b, p. 26). STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 13 Chapter 2 Literature Review Introduction The purpose of this chapter was to review the literature relevant to the purpose and questions of this study. An integrative literature review (Cooper, 1989) was conducted to present and draw conclusions from the various relevant research studies examined (Moustakas, 1994). As such, the constructs of music achievement, audiation, and music aptitude were described, and the recent history of music aptitude testing, principal researchers, and features of the music aptitude measures designed by these researchers were presented. Music Achievement Music achievement, a measure of previous music learning, (Gordon, 1998, p. 5), was typically assessed through rating scales of performance skills, fluency in reading and writing of music notation, and measurement of music theory and music history knowledge, and could be considered the skill level resulting from the aggregate of music aptitude level and accumulated music experiences (Taggart, 1989, p. 46). Therefore, music achievement was acquired, and could be augmented with continued access to a rich musical environment. Gordon (1989b) asserted students’ music achievement levels would never exceed their stabilized music aptitude levels, and observed the basis of true music achievement was the ability to generalize or infer (Gordon, 1991). Although Gordon (2006) noted a superior association between music achievement and academic intelligence, Bixler (1968), Carson (1998), Gordon (1986b, 2006), Kuhlman (2005), and Swaminathan et al. (2017) concluded no significant relationship between academic ability or general intelligence and music aptitude. Thus, music achievement seemed to be “based primarily in the brain” (Gordon, 2010, p. 211). STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 14 Audiation Music educators have struggled to agree on an appropriate term to describe how students comprehend music. Terms with dissimilar definitions such as “aural perception”, “aural imagery”, and “inner hearing” have been used by researchers and practitioners (e.g., Gromko & Russell, 2002; Gromko & Walters, 1999; Karma, 1994; Kopiez & Lee, 2006, 2008; ShuterDyson, 1999; Wöllner et al., 2003; Young, 1971) as labels, yet none seemed to adequately define what occurs when students learn music. Edwin Gordon (2015) coined the term “audiation” in 1975 (p. 9) to characterize “the ability to hear and to give meaning to music when the sound is not physically present or may never have been physically present” (p. 11, emphasis in original). Karma (1994) concurred, noting sound need not be physically present for musical thinking to occur. Gordon (1999) further described audiation, noting, “Music is the result of the need to communicate. Performance is how this communication takes place. Audiation is what is communicated” (p. 42). Audiation requires assimilation and comprehension, whereas musical imagery requires neither. Aural perception does not require comprehension and is a reaction to immediate sound events (Gordon, 2015). Memory and recognition are components of the audiation process, yet none can stand alone; imitation is a product of audiation. In addition, audiation requires musical context: the ability to sing a tune, perform a tune in a different key, tonality, or meter, play with alternate fingerings, play a variation, or move to melodic phrases are indicative of audiation. A temporal feature of audiation distinguishes it from aural perception and similar constructs, as assimilation of past or anticipated musical events occur in audiation (Gordon, 1998, p. 12). Specifically, one audiates what has been heard previously to give meaning to what is being heard presently and to predict what will be heard in the future (Gordon, 1981). STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 15 Audiation occurs in performance, improvisation, and composition in addition to imitation, due to this temporal component. Karma’s theory of auditory structuring also had a temporal factor and was discussed more thoroughly later in this chapter. Geake (1999) theorized audiational abilities were the result of three phases of information processing: successive synthesis, simultaneous synthesis, and executive synthesis. Geake conjectured how the task of listening to music might be sequenced: A sequence may be recognized (simultaneous), and then a sequence of segments (successive) may be recoded as a musical phrase (simultaneous), and so on. The importance of this cyclic arrangement for the assessment of individual differences is that low ability on either of successive or simultaneous synthesis would cause a ‘bottleneck’ for coding to proceed to higher levels. Such an analysis may explain individual differences in audiation (p. 11). Geake asserted executive synthesis was manifested as selective attention, an attribute which affected learning efficiency and further speculated increasing levels of executive synthesis were required as one moved through the hierarchy of audiation stages. Music Aptitude One’s potential for audiation is known as music aptitude. Expressed another way, music aptitude is the potential to achieve in music. Thus, “audiation is fundamental to music aptitude and consequently to music achievement” (Gordon, 2011, p. 9). As Walters (1991) summarized, the extent to which the ability of a person without instruction can hear, understand, and give meaning to specific sounds is, according to Gordon, music aptitude. Boyle and Radocy (1987) noted terms such as audiation, talent, aptitude, musicality, musical intelligence, and music ability reflected constructs used to differentiate between those who demonstrate differing levels of STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 16 performance on musical tasks. There was no apparent consensus on the definition of these terms, which were often used to reflect functions (e.g., assessment of potential) rather than musical behaviors and were applied as labels based on informal criteria which appeared to lack a base of systematic observation. Although Gordon initially used the term “musical aptitude”, he began to use the collective noun “music aptitude” instead after the 1977 publication of “Revised Learning Sequence and Patterns in Music”; the term music aptitude was used in the current study as well. Gordon (2006) described the ability to generalize sound as a key feature of music aptitude, as exemplified through audiation. Although this ability was also an attribute of general intelligence (Culp, 2017), it was not bound by previous instruction or experience. Haroutounian (2002) noted that, to a music psychologist, inherent sound discrimination and perception are at the core of musical talent and music aptitude is a measure of potential musical talent (p. 19). Mang (2013) concluded music aptitude was predictive in its function as a measurement of potential to achieve in music. A discussion of additional features of music aptitude such as the influence of environment on innate music aptitude levels and the construct of music aptitude as a unified entity or a compilation of disparate dimensions follows. In addition, associations of music aptitude with brain function, academic achievement, and general intelligence were considered. Supplementary Features of Music Aptitude Nature versus Nurture Whether music aptitude was a function of nature or nurture had been the source of ongoing debate in professional circles (Gordon, 1998, p. 5). Those who promoted music aptitude as an innate and unchangeable trait (nature) disputed those who viewed music aptitude as fluctuating and influenced by environment, practice, or training (nurture). Proponents of the STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 17 nature theory conceived one is born with music aptitude or not, a “questionable but tacit assumption” (Gordon, 1998, p. 6). Gordon aptly noted the case for the nature theory implied students with low music aptitude might not benefit from a music education, whereas music education for all was supported by the nurture theory. Atomistic versus Gestalt Whether music aptitude was comprised of discrete abilities (atomistic) or was holistic in nature (gestalt) was another disputed premise (Grashel, 2008). Measures of atomistic music aptitude yielded multiple subtest scores, as each dimension of music aptitude was considered in isolation. Pure tones, isolated pitches, pitch counting, and non-preference features were typically included in music aptitude measures (Boyle, 1992). Proponents of the atomistic view, such as Seashore, tended to be Americans. English and European counterparts, such as James Mursell and Herbert D. Wing, designed gestalt tests which yielded one composite score representing the general factor approach to music aptitude (Degé et al., 2017); thus, music aptitude was viewed as one-dimensional. Gestalt music aptitude measures typically combined tonal and rhythmic dimensions within the same test, used musical instruments as the sound source and musical phrases as the content of test items, and might include preference measures (Gordon, 1986a). Brain Function Studies of brain function and music have yielded interesting discoveries in relation to music aptitude. Music training had been found to induce auditory plasticity due to aural skill development, as indicated by structural and functional differences found in musicians’ brains (Bugos et al., 2014). In addition, Zentner and Gingras (2019) noted a possible link between interindividual variation (genotypic level) and musical behaviors (phenotypic level), reinforced by varied levels of music aptitude in the general population. The extremes of range of music STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 18 aptitude were discussed by Zentner and Gingras (2019), who labeled nonmusicians with high levels of music aptitude and undeveloped skills as musical sleepers and those with ample musical training and declining skills as sleeping musicians. Moore (1990) described Gordon’s reasoning that if preschool training could influence the development and level of general intelligence, as asserted by Montessori (1917), Piaget (1953), and Bruner (1960), perhaps preschool training could also influence musical intelligence, manifested as music aptitude. Gordon noted neurologists’ hypothesis of a possible association between myelination of great cerebral commissures and the development of the brain’s frontal lobes and stabilization of music aptitude, and related the ability to make judgments, draw conclusions, anticipate coming events, generalize, and make inferences associated with frontal lobe activity to the musical predictions necessary for audiation (Gordon, 2006, 2013). Thus, it appeared the patterns of typical brain maturation and childhood development mimicked the growth of developmental music aptitude, and brain activity was more reflective of stabilized music aptitude function. Academic Achievement and General Intelligence The relationship of music aptitude to academic achievement or IQ was subject to dispute. Citing the findings of Gordon (1986b) and Sergeant and Thatcher (1974), Kuhlman (2005) asserted academic ability and music aptitude were not significantly related. Although academic achievement had been found to successfully predict student success in music, this was most often the case when student success was defined as reading music notation (Kuhlman, 2005)—an undertaking perhaps more similar to school-related tasks than to the musical process of audiation. Numerous researchers (Allen, 1981; Bailey, 1975; Brown, 1969; Klinedinst, 1991; Mawbey, 1973; Pruitt, 1966) have asserted academic ability could be a positive factor in student retention (Kuhlman, 2005). However, a firm association between music aptitude and IQ has not STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 19 been established. Carson (1998) noted individuals may exhibit high levels of music achievement despite cognitive and academic deficiencies, and cited Gardner’s (1983) observation that music aptitude was distinct from other types of intelligence, as evidenced by brain research in which cerebral lesions have destroyed musical abilities without affecting other forms of intelligence. Bixler (1968) concluded a minimal association between measures of music aptitude and those of memory and auditory abilities and relative independence of music aptitude from intelligence and academic achievement. Gordon (2006) concurred, asserting only a 5–10 percent relationship between music aptitude and intelligence test scores. Swaminathan et al. (2017) reported no association between music training and intelligence after controlling for music aptitude: it appeared students with higher levels of intelligence and music aptitude chose to participate in music training, rather than music training influencing intelligence. Due to limited evidence of a definitive association between music aptitude, academic achievement, and general intelligence for elementary students, a more thorough review of the topic was not deemed relevant to the purpose of the current study. Stages of Music Aptitude Gordon (2001a) asserted two stages of music aptitude: the developmental music aptitude stage, in which constant fluctuation of a child’s music aptitude level occurred due to the influence of instruction and musical environment, and the stabilized music aptitude stage, which occurred after age 9 and was defined by its lifetime constancy, regardless of environment. It was evident no distinction between stages of music aptitude was made by early researchers, as they implied music aptitude was crystallized or stabilized at birth (Gordon, 1998, p. 18). Gordon (1981) asserted the stabilized music aptitude stage was preceded by a developmental stage in which music aptitude fluctuated in accordance with a child’s innate potential and the quality of STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 20 the musical environment. Walters (1991) ascribed the defining feature of developmental music aptitude as instability of young children’s music aptitude scores attributed to the varying effect of musical influence, which resulted in constant adjustment of their relative music aptitude positions as nature and nurture interacted. The developmental music aptitude stage was noted for its volatility of test scores: the effect of instruction, training, and experience was manifested as the instability of scores of students in the developmental music aptitude stage. Nevertheless, Gordon (1986c) noted children’s ability to respond intuitively and to audiate immediate impressions were an indication of the level at which their music aptitude would stabilize (p. 9). Thus, it was posited appropriate formal instruction during the early school years could positively affect the level of developmental music aptitude, and the younger children were, the more they would benefit from a quality music environment (Gordon, 2001a). Music instruction received during the developmental music aptitude stage created a lasting influence, as music aptitude stabilized around age 9 and would remain the same throughout life (Gordon, 2012, p. 46). Gordon (1981) found both developmental and stabilized music aptitude test scores were normally distributed. Nevertheless, the number and content of the dimensions (e.g., tonal, rhythm, expression) differed. Although measures for seven dimensions of stabilized music aptitude had been developed, only measures of tonal and rhythm dimensions of developmental music aptitude were found adequate. Gordon surmised this could be due to an absence of additional developmental aptitude dimensions or the lack of an appropriate music aptitude test for very young children. Nevertheless, Gordon (1995) attributed approximately 55 percent of the reason or reasons for student success in school music to MAP scores, noting the remaining 45 percent was associated with extra-musical factors (p. 8). STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 21 The need to establish musical context for accurate measurement was a defining attribute of stabilized music aptitude testing. Gordon (2005) noted the necessity to audiate content within a musical context, in order that sound would be perceived as musical. A key criticism of atomistic measures of music aptitude, and Seashore’s Measures of Musical Talent in particular, was the lack of musical material as test content, which resulted in the measure of acoustical rather than musical abilities, critics such as Mursell (1937) contended (Haroutounian, 2002, p. 14). Consequently, tonal test items presented within a rhythmic structure and rhythm test items within a tonal context were promoted by Gordon as offering the greatest validity. Tests of stabilized music aptitude such as AMMA and MAP included recorded performances of melodic test prompts, from which test takers must answer questions about either the tonal or rhythm aspects, while ignoring the other (Gordon, 2005). Brief History of Music Aptitude Testing Gordon (1987) contended it was through an examination of the content of a valid music aptitude test that an understanding of music aptitude could be gained (p. 2), students most likely to benefit from instruction might be identified, and musical strengths and weaknesses might be diagnosed in order that instruction could be individualized appropriately (Gordon, 1990c). He further reasoned young students with minimal background in music instruction who attained high scores were an indication a test measured music aptitude and not music achievement (Gordon, 2002). South Korean and American teachers in Reynolds and Hyun’s (2004) qualitative study of teachers’ understanding of music aptitude acknowledged they had attended to non-musical behaviors such as attitude, participation, compliance, and academic achievement in their assessment of their students’ music aptitude levels. In his study of diagnostic validity of MAP with a sample of fourth- and fifth-grade beginning instrumental students, Gordon (1998) noted STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 22 teachers were not fully aware of student music aptitude levels, despite instructing the control groups for two years (p. 107). Correlations of these teachers’ estimates of student music aptitude and students’ MAP scores were moderate: .29 (tonal imagery and rhythm imagery), .34 (musical sensitivity), and .43 (composite) (Gordon, 1998, p. 107). Hatfield (1967) reported a similar finding in his study of MAP’s diagnostic validity with instrumental music majors (Gordon, 1998, p. 107). On the other hand, Gordon (1970) found instrumental music achievement was greater when music teachers used their knowledge of student music aptitude scores to diagnose students’ strengths and weaknesses for instructional purposes than when teachers did not have and use music aptitude scores. In the process of developing his music aptitude measures, Gordon (2005) studied the design, content, and findings of all existing music aptitude tests, selected their most efficacious attributes, and adapted these components for inclusion in his own music aptitude tests. As the current study focused on the features of Gordon’s music aptitude measures, a description of his contribution to music aptitude testing preceded descriptions of the early music aptitude researchers and the music aptitude tests developed by Seashore, Drake, Wing, Kwalwasser and Dykema, Bentley, Gaston, and Karma. Gordon Gordon (2015) coined the term audiation to “explain how music is given meaning by persons of all ages” (p. 9). Geake (1999) cited numerous researchers such as Gordon (1989d), McPherson (1995), and Schleuter (1984) and acknowledged evidence of audiation as fundamental to musical ability and success in music. Gordon described music aptitude as the process by which we audiate, and posited two stages of music aptitude: developmental and stabilized. Gordon concluded music aptitude consists of several aspects that are interrelated but STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 23 independent (Degé et al., 2017); thus, multifactorial (atomistic) and general factor (gestalt) approaches were combined in the aptitude tests he developed. Gordon (2013) maintained a child’s aptitude level was established at birth; however, he disagreed that music aptitude was an inherited trait, noting lack of evidence to support the role of heredity in determining music aptitude (p. 12). To clarify, although music aptitude could be transmitted through the genes, it was not predictable based on ancestry (Gordon, 1998, p. 7). Unlike Seashore, Wing, and other researchers who were psychologists with an interest in music, Gordon (1998), a musician with an interest in the psychology of learning, concluded nature and nurture both contributed to music aptitude, noting the nurture theory indicated children would become musical as a result of having musical parents and the nature theory suggested musical parents could only bear musical children. His logical conclusion was that music aptitude must be a product of both nature and nurture (p. 7). Moore (1990) cited the work of other music psychologists (Deutsch, 1982; Gordon, 1971; Lundin, 1967; Shuter, 1968), who concurred with Gordon’s assertion that innate potential and musical exposure were both components of music aptitude. Gordon (1981, 2002) noted the ability to identify tonal difference and rhythm difference was associated with music aptitude, and asserted students who were able to identify difference typically exhibited higher music aptitude than those who could not: because most students could recognize sameness, its relationship to music aptitude was negated (Gordon, 2004). The primacy of sameness also had been noted by researchers in disciplines other than music (Gordon, 1981). Nevertheless, how children attend to sameness and difference in music was progressively dissimilar as students aged. Gordon (1981) noted children’s increasing attention to relation of sameness to difference as chronological age increased; any consideration of sameness and difference as discrete entities emphasized difference. The results of factor analyses of four years STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 24 of PMMA scores (Gordon, 1981) allowed for the conclusion that changes occurred in audiation from age 5 to 8: test items were audiated differently as the subjects aged. Gordon (1981) conjectured music aptitude was multidimensional: more than thirty different music aptitude were revealed in the pre-standardization research for the Musical Aptitude Profile and Gordon (1998) contended more levels of music aptitude could be identified with appropriately-designed tests (p. 11). However, Gordon asserted tonal and rhythm dimensions of music aptitude had the greatest import on music learning (Geake, 1999). Gordon distinguished stabilized music aptitude as resistant to music instruction, and asserted developmental music aptitude was subject to fluctuation due to environmental influences. Gordon appropriated the terms “acculturation”, “imitation”, and “assimilation” to describe the types of preparatory audiation in which young students absorbed and engaged the musical environment with increasing consciousness. In Gordon’s (2012) view, students continued to expand their audiation abilities throughout the developmental music aptitude phase, “although the effect of environment on a child’s music aptitude decreases substantially with age” (Gordon, 1981) and gain scores decrease until approximately age 9 (Gordon, 1986a). Because developmental music aptitude scores might increase due to the influence of instruction, Gordon’s tests of developmental music aptitude had been denounced as tests of music achievement (Gordon, 2005). Gordon (2005) revealed the error in this perception: maintenance of relative position in score distributions was a feature of the stabilized music aptitude stage, but not of the developmental music aptitude stage. In fact, the median correlation of scores on the same subtests administered years apart approached .80 for stabilized music aptitude tests, but only approximated .30 for developmental music aptitude tests. Gordon (1998) stated definitively that music aptitude crystallized or stabilized after approximately age 9 (p. 10). Once stabilized, one’s STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 25 music aptitude level became resistant to environment, even when that environment was musically rich (Gordon, 2013, p. 14). Walters (1991) noted Gordon’s extensive contribution to the field of music aptitude measurement; Carson (1998) concurred, touting the superior technical documentation of Gordon’s music aptitude tests. Numerous pre-publication studies were conducted in order to minimize the effect of music achievement and maximize MAP’s focus on music aptitude (Gordon, 1965): Gordon (2002) surmised that when students with minimal background in music instruction attained high music aptitude scores, the music aptitude test truly measured music aptitude rather than music achievement. A low correlation between music aptitude scores and factors such as intelligence scores was considered evidence the test measured what it was designed to measure (Gordon, 1998, p. 88). In addition, a well-designed music aptitude test should report high reliability of each subtest, comparatively low intercorrelations with other subtests, and a high correlation with the battery’s total score (Gordon, 1998, p. 89). Scores from music aptitude tests have been significantly correlated with intonation (Gordon, 1970) and performance achievement (Brokaw, 1983; Froseth, 1971; Gordon, 1984a, 1986b, 1989d, 1990c, 2001c; Pereira et al., 2017; Schleuter, 1978). Although Guerrini (2004) concluded singing achievement was affected by tonal music aptitude, many others (Atterbury & Silcox, 1993; Hornbach & Taggart, 2005; Mota, 1997; Phillips et al., 2002; Rutkowski, 1986, 1996; Rutkowski & Miller, 2003a), as cited by Hornbach and Taggart (2005), concluded a weak relationship between singing voice use and developmental tonal aptitude (Rutkowski, 1996). Additionally, the relationship between music aptitude and composition was not supported in the research (Henry, 2002). Thus, the general influence of music aptitude as a predictor of student success in music had not been thoroughly STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 26 substantiated, despite Gordon’s (2005) conclusion that certain types of music aptitude (e.g., meter or tempo aptitude) were more robust than others for predicting success in school music. Gordon (1987) asserted an understanding of music aptitude could be gained through examination of the content of a valid music aptitude test (p. 2). Culp (2017) concurred, noting music aptitude was not reliably observed, but could be reliably measured. Two purposes of music aptitude testing were identification of students most likely to benefit from music instruction and identification of musical strengths and weaknesses for appropriate individualization of instruction (Gordon, 1990c). Scores from music aptitude tests were intended to provide teachers and parents with objective data, for use in individualizing instruction and guiding children to achieve their musical potential (Gordon, 2006). Gordon concluded a weak association of developmental and stabilized music aptitude from factor analytic results of his 1986a study and the results of Gordon’s 2002 examination of PMMA and non-preference MAP subtests supported the need for different types of music aptitude tests to measure developmental and stabilized music aptitude. To that end, Gordon (2006) developed measures of two developmental music aptitudes (tonal and rhythm), two stabilized tonal aptitudes (melody and harmony), two stabilized rhythm aptitudes (meter and tempo), and three stabilized preference aptitudes (phrasing, balance, and style). In developing these music aptitude tests, Gordon (2005) thoroughly examined existing music aptitude tests, to be described in the following section of this chapter. He sought to determine the subjective and objective validities of the better-known existing tests, in order to use new knowledge and techniques to develop different types of new tests, from which the most feasible would be compared with existing tests. Gordon (1986c) noted the cyclical nature of music aptitude testing and instruction based on test results—particularly for those students in the developmental music aptitude stage, whose level of music aptitude STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 27 fluctuated based on interaction with the musical environment—and asserted a comparison of scores of two administrations of PMMA or IMMA a semester or year apart would suggest the need for changes in instruction (p. 9). Therefore, Gordon encouraged the use of music aptitude test results to adapt instruction to suit each student’s individual learning needs, in addition to their typical use for identification of students with high music aptitude for recruitment into music programs (Gordon, 1995, p. 9). Seashore Carl Seashore, a psychologist with an interest in music (Gordon, 2005), believed music aptitude was inborn, inherited, and could be predicted accurately (Gordon, 1981). Seashore developed the first standardized recorded battery of aptitude tests designed for students age 9 and older (Gordon, 1986a) to identify and educate the musically talented (Gordon, 1981), and based its design on atomistic principles. Thus, the Seashore Measures of Musical Talents (1919) consisted of subtests intended to measure discrete aptitudes; no composite score was calculated. Seashore’s belief that aptitude was multidimensional was exhibited in the multiple subtest scores yielded in his 1919 test battery: each score represented a unique music aptitude. The content of the Seashore Measures of Musical Talents included isolated tones produced without musical instruments (e.g., tuning forks and beat-frequency oscillators) and without musical characteristics; syntactical relations of pitches were avoided, as Seashore believed such a relationship would more likely measure music achievement than music aptitude (Gordon, 1998, p. 24). In addition, Seashore attempted to include series of pitches that would be culture-free (Gordon, 1998, p 24). Seashore believed training or practice should not affect test scores (Gordon, 1986a), as he was of the opinion that nature was the source of music aptitude. Seashore claimed reliability coefficients in the high nineties if his aptitude test were administered STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 28 under ideal laboratory conditions (Stevens, 1987, p. 19) and rejected attempts to use external criteria to validate his tests, as these terms would fall outside of the atomistic construct he favored (Stevens, 1987, p. 22). Drake The Drake Musical Aptitude Tests were published in 1954 by psychologist Raleigh M. Drake. Norms were provided for students as young as age seven (Gordon 1998, p. 50). Unlike Seashore’s test battery, melodic phrases performed on piano were used as the test stimuli. Respondents were tasked with listening to a test phrase and indicating the nature of its successive rendition. Changes to the test phrase could be related to pitch, rhythm, or key. It was of interest that Drake provided two forms of each test, differing in difficulty, as well as norms for non-music and music students, defined as those having five or more years of musical training (Gordon, 1998, p. 42). Gordon (1998) noted the likelihood that Drake considered music achievement to be a factor of music aptitude, since test takers must have had some formal music instruction in order to be familiar with the concepts of modulation, notes, and time required for success on these tests (p. 41). In addition, Gordon observed Drake’s inclusion of tonal and rhythm responses in the same test might be sufficient for older students, but were less appropriate for young children in the developmental music aptitude stage. Wing Psychologist Wing developed the Standardized Tests of Musical Intelligence (1939, revised in 1946) to identify musically intelligent children entering secondary school (Wing, 1962) so they might take advantage of instrumental training. Wing’s test battery exemplified the gestalt theory of music aptitude, in which a general or omnibus factor of musical ability was sought; tonal and rhythm dimensions were included within the same test, test prompts were STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 29 produced with musical instruments in musical contexts, and a composite score was yielded (Gordon, 1986a). Wing favored nurture as the source of music aptitude, and attempted to measure musical understanding (Haroutounian, 2002, p. 14); thus, Wing’s test battery was designed to measure musical acuity and sensitivity (Wing, 1962). The listener was asked to decide if rhythm, accent, intensity, or phrasing alterations to familiar melodies played on the piano were the same or different, and if different, which presentation was preferred (Gordon, 1998, p. 44). Wing believed music preference was a component of music aptitude (Gordon, 1998, p. 45); thus, four preference subtests were included in the battery. Strong criterion validity with teachers’ ratings (.64–.90) and solid reliability (.91 for whole test) were reported, and norms for students aged 8 and older were included (Haroutounian, 2002, p. 292). Gordon (2005) critiqued the premise of Wing’s test battery, noting knowledge of number of pitches or similarity of chords without musical context was not relevant to the practice of music. Nevertheless, Gordon (1998) considered Wing’s music aptitude tests superior to Seashore’s, as Wing recognized Stage 3 audiation, in which objective or subjective tonality and meter were established, was integral to stabilized music aptitude (p. 49). Kwalwasser-Dykema Scores on the Kwalwasser-Dykema Music Tests, published in 1930, were intended to inform teachers of student ability in order that instruction could be individualized (Stevens, 1987, pp. 13–14). This test battery, authored by two music education professors, was based on the atomistic view. Students as young as first grade could be administered the 10 subtests, which included recorded performances of orchestral instruments (Stevens, 1987, p. 14). Combined norms were provided for students 8 years old through professional musicians aged 40 (Gordon, 1998, p. 35). Gordon (1998) noted Kwalwasser seemed to concur with Seashore on stabilization STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 30 of music aptitude at age eight or nine or believed younger children incapable of understanding music aptitude test directions (p. 35). Test prompts included a melody played twice; the listener was tasked with deciding if the two renditions were the same or exhibited a change in pitch or rhythm. Two preference subtests measured “musical feeling and appreciation” (Stevens, 1987, p. 14): Melodic Taste, in which listeners indicated which of two endings was best, and Tonal Movement, which measured the ability to judge the tendency of the final tone to proceed to a point of rest (Gordon, 1998, p. 34). Because the test battery contained sections that measured music achievement, content validity was limited (Stevens, 1987, p. 29). Reliability and intercorrelations among subtests were not reported by Kwalwasser (Gordon, 1998, p. 37). Bentley Although Bentley was a proponent of the gestalt theory of music aptitude, his Measures of Musical Abilities (1966) seemed to combine gestalt and atomistic principles without an obvious rationale (Gordon, 1998, p. 50). Bentley’s test battery consisted of four subtests, and was the first intended to measure music aptitude in children as young as age 7 (Haroutounian, 2002, pp. 14–15), with the goal of examining only those abilities that were basic and essential to the performance of music (Young, 1973). Students were tasked with identifying the relationship of two pitches as same, higher, or lower (pitch discrimination subtest), counting the number of pitches (tonal memory subtest), determining the number of pitches in each chord (chord analysis subtest), and identifying the relationship of two rhythm patterns as same or noting the number of the altered beat if changed (Rhythmic Memory subtest). Bentley seemed to believe a similar description of music aptitude was appropriate for younger and older children and only attempted to clarify Wing’s test directions (Gordon, 1998, p. 50). He reported significant correlations between teachers’ estimates of musical ability and student test scores and moderate-to-high STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 31 reliability ratings of .84 for the total battery (Haroutounian, 2002, p. 293), as well as criterionrelated validity findings of college level and professional performances, despite the test’s focus on very young children (Gordon, 1998, p. 50), but neglected to provide percentile rank norms or reliability or validity coefficients for the subtests. Gaston Gaston’s 1957 Test of Musicality was intended for administration to students aged 10 and up to measure musical ability and interest (Stevens, 1987, p. 17). This brief battery included three subtests, and reported split-half reliability coefficients of .88 for upper elementary and middle school students and .84–.90 for high school students (Stevens, 1987, p. 39). A significant relationship between teachers’ ratings and scores on the Test of Musicality also was reported (Haroutounian, 2002, p. 292). Karma In contrast to Gordon’s definition of music aptitude as the realization of audiation, Karma (1994), a Finnish psychologist, defined music aptitude as “the ability to hear or perceive sound structures” (p. 20) whose distinct characteristic was temporal; Karma (2007) claimed music aptitude was the ability to listen to music “musically” and efficiently. Karma’s (1994) music aptitude definition was purposefully culture-free and separate from emotions or personality traits; his construct of music aptitude delineated auditory structuring as separate from sensory capacities, and he claimed a sufficient ability to hear differences in pitch, length, intensity, and timbre was a necessary condition for perception of pattern structuring, but did not belong to the construct of music aptitude (Karma, 2007). The temporal feature of sound patterns formed the basis of Karma’s music aptitude measure and he considered the temporal aspect an essential property of music cognition. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 32 Karma’s measure of music aptitude was designed to avoid the effects of culture and training, even while presenting test items within a musical context. A sample test item might require the student to listen three times without pause to a brief pattern of sounds and form an image of one iteration of the pattern. A fourth pattern was then played, after which the listener must determine if the final pattern was the same as or different from the image (Karma, 1994). Boyle (1992) noted Karma’s contention that structuring strategies were superior to traditional testing for assessing music aptitude (p. 250) and assertion that discrimination tasks in measurement of music aptitude and test validation based on correlations with achievement tests were inappropriate (p. 249). Carson (1998) noted Karma’s conceptualization of music aptitude as an indicator of broad and fundamental cognitive abilities was not musical per se. Karma conceded, concluding from the results of his 1994 replication study that successive synthesis, a non-musical process based on the time-order of stimuli, formed the fundamental basis of audiation (Geake, 1999). The results of Karma’s 1990 auditory structuring test were compared to results of a parallel visual version administered to students with significant hearing loss (Karma 1994), from which Karma concluded the presence of sound was not a necessary condition for music perception. This was an important departure from the more conventional conceptualizations of music aptitude of previous researchers as described by Boyle (1992). Gordon asserted sound would not need to be physically present in order for audiation to occur. Karma (1994) concurred, but asserted the defining characteristic of music was not sound, but the thought processes triggered by sound. Karma claimed the findings of his 1994 study supported the assumption that his auditory structuring test for subjects with no hearing impairment and the visual parallel test for subjects with congenital hearing loss measured the same process of temporal structuring, which was theorized as a measurement of music aptitude. This rationale STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 33 offered a construct of music aptitude markedly different from Gordon’s description of audiation. Indeed, Karma (1984) seemed to imply the ability to conceive musical structure preceded what Gordon would define as audiation. Recent Music Measures The conversation surrounding music aptitude, music ability, and music perception has continued within the fields of music education and music psychology. Although some researchers seem to distinguish between music ability, music skills, and music competence and have developed measures accordingly (Law & Zentner, 2012; Müllensiefen et al., 2014; Wallentin et al., 2010; Wolf et al., 2018), advocates of Gordon’s work would label all as tests of music achievement. A lack of consensus on what should be measured (real-world skills or technical knowledge) and who should be administered measures (amateurs or professionals, musicians or nonmusicians) persists. Other researchers endorse the use of culturally responsive performance-based assessment to mitigate racial inequity (Hood, 1998). Although other tests of music perception have been published more recently, their target population was adults who would have achieved the stabilized music aptitude stage previously. The Musical Ear Test (2010) is a test of musical competence, designed to measure musical abilities of professional, amateur, and non-musicians (Wallentin et al., 2010). As such, it is a test of music achievement. In the Goldsmiths Musical Sophistication Index (Gold-MSI; Müllensiefen et al., 2014), the term “musical sophistication” was used to label a psychometric construct intended to include musical skills, expertise, achievements, and behaviors in college-age students or adults, without assumptions made of innate, inherent, or acquired attainment. Thus, Gold-MSI is also a test of music achievement. Another music achievement test, the Profile of Music Perception Skills (PROMS; Law & Zentner, 2012), measured musical perception to determine musical STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 34 competence of college-age students or adults. Advanced amateur and professional musicians were also the focus population of the Musical Ear Training Assessment (META; Wolf & Kopiez, 2018), which was designed to measure ear training skill, a form of music achievement. Because the focus of these music achievement tests differed from the music aptitude focus of the current study, an in-depth review of these tests was deemed extraneous. Critique of Previous Music Aptitude Measures Gordon’s criticism of previous music aptitude tests focused on content and validity. He found, as had Mursell (1937), that Seashore’s Measures of Musical Talent (1919), in its isolation of musical elements, was void of the musical context needed to describe music abilities (Gordon, 1969; Haroutounian, 2002). Gordon (2005) believed Seashore’s subtest measured auditory acuity rather than a music trait. Haroutounian (2002) described Gordon’s objections as the laboratorynature of the sound production, extreme attempts to isolate each attribute of music aptitude, and lack of musical context in the presentation of test items, which resulted in a test experience devoid of musical functioning (p. 14). Gordon (1989b) noted that a pitch discrimination test, such as he considered Seashore’s test battery, could have negative validity only and thus could predict only whether students could not profit from instruction. Unlike the Seashore battery, the MAP battery excluded subtests relying on discrimination (pitch or time) or memory (tonal or rhythm) and contained preference subtests to measure musical sensitivity (Gordon, 1969). In his 1969 study of intercorrelations between MAP and the Seashore Measures of Musical Talents, Gordon noted a weak relationship between corresponding subtests of the two batteries and concluded the batteries assessed different abilities. Gordon also had reservations about certain features of other music aptitude tests: he asserted the music aptitude tests of Wing and Bentley were, fundamentally, measures of music achievement (Gordon, 2002), and disputed the musical STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 35 relevance of Wing’s (1939) subtests (Gordon, 2005). Geissel (1985) observed Wing’s use of the same music content for younger and older students, which might have affected the test reliability negatively when used with younger students (p. 2); these issues were noted for Bentley’s Measures of Musical Abilities as well (Geissel, 1985; Gordon, 2002). Boyle (1992) noted differences in the musical tasks and content measured in various music aptitude tests (p. 249). Discrimination, memory, and recall required different processes and were unequal tasks, yet had been measured as if equivalent in many aptitude tests. Music ability, music aptitude, and music achievement had also been examined and compared inaccurately as parallel constructs. Music sensitivity or preference as a construct seemed to be generally accepted (Boyle, 1992, p. 251); however, there was little agreement on how it should be measured (Bugos et al., 2014). In addition, the appropriateness of the musical tasks and content for predicting musical potential was in dispute (Boyle, 1992, p. 249). Karma dissented from the more commonly accepted definition of music aptitude as well as from the type of validation typically sought and codified in Gordon’s work. Karma (2007) was critical of the samples of music students in previous music aptitude testing which were unrepresentative of the general population, and noted poor predictive or ecological validity of music aptitude measures to predict real-world musical skills or behaviors. He asserted a music aptitude measure with predictive validity was a composite of several constructs and therefore unsuitable for testing music aptitude. Karma (1982) opposed the practice of establishing music aptitude test validity through correlation of music aptitude test scores with an individual’s true music aptitude level, noting the true nature of music aptitude was unknown and correlations were often contaminated with variables that measured non-musical factors. Instead, Karma (2007) preferred the use of construct validity, and proposed reporting of validity as measures of STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 36 relationship or difference rather than statistical significance (Karma, 1982). Gordon (1986c) reported content validity as correlations of PMMA and IMMA with MAP and the Iowa Tests of Music Literacy (ITML) and concurrent (criterion-related) validity as correlations with teachers’ ratings of student performance, in addition to longitudinal predictive validity (pp. 97–109). Karma (1984) reported moderately high correlation coefficients (between .60 and .76) of auditory structuring test scores and teachers’ estimates of student aptitude. He asserted high correlations of instrumental teachers’ estimates of student music aptitude with results of music achievement tests and self-reports were reasonable, but not with intelligence and amount of musical training. Test validity could be claimed if correlations were predictive of test results (Karma, 1982). However, Gordon (2012) adjudged music aptitude measures did not have reasonable correlations with teachers’ estimates of student music aptitude (p. 51) and, in fact, empirical evidence pointed to a weak relationship between teacher’s estimates of music aptitude and student music achievement (Reynolds & Hyun, 2004; Stamou et al., 2010; Taggart, 1989). Karma’s theory was far different from the generally-accepted construct of music aptitude based on audiation, and his music aptitude measure was unconventional as well. Another detractor of Gordon’s work, Australian John G. Geake (1996), examined musicspecific information processing involving perception, memorization, abstraction, and directed attention planning. Although Geake (1996) concurred with Gordon that abilities are developed as a result of appropriate experience, Geake (1999) attributed audiational abilities to information processing, as manifested in neuropsychologist Alexander Luria’s (1966, 1970, 1973) cyclic model of simultaneous, successive, and executive synthesis, in which musical perception closely parallels text-reading (Geake, 1996). Geake concluded information was abstracted as it was encoded for memory and planning within brain functioning. This frame of reference did not STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 37 account for an innate and individual baseline of music aptitude upon which early music experience was imprinted, however, and assumed all cognitive thought was processed similarly. Proponents of Gordon’s work might argue audiation was a uniquely musical process. The relationship between information processing and music perception abilities in samples of typical and “musically gifted” students, who were nominated by teachers for demonstrating extreme musicality or were selected into advanced music training programs at an early age, was investigated in Geake’s 1999 study. Geake (1999) speculated the “mozart” (p. 33) students would perform better on the MAP than their non-gifted peers. From the findings of a principal component analysis, Geake (1999) concluded the presence of three components: Component 1 reflected successive synthesis, suitable for musical tasks requiring informational encoding such as sequences of rhythm patterns; Component 2 reflected executive synthesis, defined as “the formation of intentions and programs for behaviour” (p. 11) and manifested in the generation of expectancies (Geake, 1996); and Component 3, which showed high loadings for simultaneous synthesis, marked by the ability to focus on “the complete auditory image produced by the set of components as a whole” (p. 32). Geake (1999) concluded MAP’s measurement of audiation was dependent on general attentional and successive information processing abilities. The musically gifted students demonstrated significantly higher information processing abilities than their non-gifted peers; Geake (1996) contended their ability to concentrate on music learning and performance tasks could be explained by executive synthesis, in which attention was employed for metacognition. In other words, Geake attributed the musically gifted students’ advanced performance abilities to their superior executive synthesis abilities. Nevertheless, the superior ability of the mozart subjects to concentrate on music learning and performance tasks also could be explained by superior ability to audiate (higher STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 38 music aptitude). As the basis of their selection could hardly be viewed as objective, it was likely Gordon would have questioned the qualifications of the mozart students as musically gifted, thereby negating the premise of this study. Geake (1996) went on to assert that short-term memory played a pivotal role in the processing of musical information, serving as an underpinning ability. He challenged Gordon’s disavowal of the importance of short-term memory in MAP performance, arguing that individual differences in short-term memory might partially predict individual differences in MAP scores. Geake (1999) noted the musically gifted sample demonstrated significantly better short-term memories than their non-gifted peers, and reported strong correlations for scores on each MAP test (tonal, rhythm, and musical sensitivity) with non-music specific abilities such as successive synthesis, thus suggesting MAP might not provide the complete music aptitude assessment intended. Nevertheless, the equivalency of short-term memory and other information processing abilities and audiational abilities had not been established empirically. Gordon (1998) asserted rapid and successive exposure to many different musical phrases would force students to memorize, rather than audiate, and consequently, subtests would be related to music achievement rather than music aptitude. A key feature of Gordon’s music aptitude measures was the strictly limited amount of time in which the test-taker must select a “same”/“like”, “different”, or “in doubt” response; Geake’s studies lacked sufficient detail describing the speed at which students were tasked with processing information. The ability of readers to compare Geake’s theory of music aptitude with Gordon’s music aptitude construct was inhibited by this dearth of information. In addition, Geake (1996) noted teachers commonly related high music ability to other intellectual abilities in mathematics and language, and deemed superior ability of musicians labeled “gifted” or “talented” was due to ability to process musical information at a STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 39 high level. As previously discussed, Gordon disputed the ability of teachers to identify musical giftedness impartially without the aid of a valid and objective music aptitude measure. Furthermore, it had not been established that an “ability to process musical information” was equivalent to the process of audiation. Geake (1999) also disputed Gordon’s consideration of musical sensitivity as a dimension of music aptitude, noting inclusion of musical sensitivity as a factor of music aptitude was inconsistent with Boyle’s taxonomy and his cited evidence. Nevertheless, Geake commended Gordon for reporting musical sensitivity scores independently of the composite MAP score, which resulted in a genuine profile of music aptitudes. Critique of Gordon’s Music Aptitude Measures The original participant sample on which IMMA norms were based included a relatively small group of Grade 1–4 students, many of whom had taken and scored high on PMMA. The students attended schools in 11 school systems in Pennsylvania and New York, of which one was described as a private academy and another an “inner city school” (Gordon, 1986c, p. 85). It was likely the sample was more homogeneous than heterogeneous, given the limited geographic area represented and the previous association of the school systems with Gordon, who had administered IMMA to most students in the sample (p. 85). Selection of the norms sample served two needs of IMMA use: to document the statistical properties of the test, and to provide an objective comparison as an alternative to local norms (p. 85). Lack of a standardized music curriculum among, and differences in instruction between, the school systems represented in the norms sample was typical at the national level (Gordon, 1986c, p. 86). Gordon had not defined or identified the need for informal guidance at the time STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 40 IMMA was created (M. Runfola, personal communication, March 26, 2021). Thus, it was presumed all students in the norms sample were taught using traditional instruction. Due to the lack of diversity in the original norms samples and the type of instruction offered to those students, it seemed findings of Gordon’s music aptitude measures were generalized to samples not reflective of or relevant to current student samples in many areas of the United States. Since the sample of the current study was racially homogeneous, examination of the sample’s make-up was outside the parameters of the current study. However, creation of local norms is recommended for a more accurate comparison of findings of diverse populations. Gordon (1986c) suggested the development of local norms was an outgrowth of frequent test administrations and might be superior for comparing relative standing. Holahan and Thomson (1981) concurred, proposing construction of local norms as standard practice for all tests. Around the time of Gordon’s research, a change was occurring in political perspective from a perception of America as a “melting pot” of minority groups, responding to societal pressure to assimilate into the dominant culture, toward a view of America as a “salad bowl”. Thus multiculturalism, defined by the Stanford Encyclopedia of Philosophy as “an ideal in which members of minority groups can maintain their distinctive collective identities and practices” (Multiculturalism, 2016, para. 1), became the goal and practice of many Western societies. For example, Canada officially adopted a policy of multiculturalism in 1970 (Hess, 2015). Nevertheless, multiculturalism emphasized commonality while downplaying difference, thereby crucial discussion of the inequity of power inherent in systemic racism was circumvented. Bradley (2007) noted the use of terms such as “culture”, “ethnicity”, and “nationality” and euphemisms such as “poverty problem”, “welfare”, “urban schools”, and “diversity” was a means of pointing out difference while avoiding direct discussion of race, and STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 41 attributed this to the emphasis on commonality promoted by multiculturalism. Yosso (2005) concurred, noting “cultural difference” as an additional example of race coding. Thus, Gordon’s use of the terms “culturally disadvantaged” and “inner-city”, while problematic, were indicative of the time period in which his research was conducted. Despite this, future researchers are obligated to look critically at how music aptitude is conceived, defined, and measured. If the construct of music aptitude is truly universal, studies to confirm its accurate and equitable measurement in students from all backgrounds and educational settings must be conducted. Gerhardstein (2001), in his biography of Gordon, noted the impact of the University of Iowa on Gordon’s work. Gordon served as a student, professor, and researcher at University of Iowa until becoming the Director of Music Education at the State University of New York at Buffalo in 1972 (p. 22); his tenure at University of Iowa, “a national center for educational testing and measurement” (p. iv) was influential in his development of MAP. Gordon seemed to concur with the procedures used to construct norms for the Iowa Test of Basic Skills (ITBS), a well-established and widely used standardized measure of that time period, as evidenced by his adoption of the pupil profile chart introduced in the ITBS scoring procedure, “in itself a plotting device [italics in original], necessitating no recourse to tables of norms or overlay masks of any kind” (Peterson, 1983, p. 32). In addition, Gordon provided grade-equivalent scales and percentile norms similar to those published in ITBS in his music aptitude measures (Peterson, 1983, p. 136). The precedent of calculating separate norms for groups which exhibited different levels of performance was established by ITBS (Peterson, 1983, p. 230) and duplicated in Gordon’s (1995) aggregation of MAP scores for musically select students (p. 49). It was likely Gordon’s terminology in describing the racial backgrounds of the participants in his post-MAP research samples also was reflective of the influence of ITBS. State and national standards were STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 42 reported by ITBS, in order for school administrators to have the option to choose the set of standards they believed most representative of their student population (Peterson, 1983, p. 231). An ITBS test author noted factors such as population features, educational resources, and cultural traditions to be considered in the creation of norms (Peterson, 1983, p. 230). The fundamental differences in definition and process of the construct of music aptitude, such as auditory structuring (Karma), information processing (Geake), and audiation (Gordon), prohibited direct comparison of music aptitude measures. It remained the task of music researchers and teachers to review extant literature, compare music aptitude constructs and data supporting each construct, and draw their own conclusions when selecting an appropriate music aptitude test for a particular group of students. As the current study focused on measures of stabilized music aptitude based on Gordon’s construct, descriptions of those measures, in order of initial publication date, were presented and the selected and adapted features of previous music aptitude measures included in Gordon’s music aptitude test batteries highlighted below. Music Aptitude Measures Developed by Gordon MAP The original 1965 publication of Gordon’s Musical Aptitude Profile (MAP) followed 8 years of extensive research in music aptitude, included more than 5,800 student participants in seven pre-standardization studies (Runfola, 2016, pp. 360–361), and resulted in publication of a comprehensive test of stabilized music aptitude for use with students in Grades 4–12. MAP exhibited standards of reliability and validity on par with those reported for academic and diagnostic achievement tests (Gordon, 2001c), the highest of any music aptitude test (Walters, 1991). Walters noted the import of Gordon’s (1967b) three-year longitudinal study in establishing the predictive validity of MAP and isolating the constructs of aptitude and STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 43 achievement; the findings yielded impressive predictive validity coefficients (.61 in Year 1, .72 in Year 2, and .77 in Year 3) which served to promote the use of MAP as a predictor of instrumental music achievement (Geissel, 1985, p. 8). Although a strong correlation of music aptitude test scores and teacher ratings and judges’ evaluations of students’ performances was observed, it should not be inferred that music aptitude scores are the result or outcome of music achievement (Gordon, 2001c). Aptitude tests are marginally measures of achievement, although the effect of achievement should be minimized as much as possible. Gordon recommended administering the aptitude test prior to instruction, providing a prolonged training program, evaluating performance after training, and comparing aptitude and performance scores to establish predictive validity, as a longitudinal examination would provide clear evidence of the test’s effectiveness in predicting music achievement. MAP, described as an eclectic battery due to its use of preference and non-preference subtests, drew from both atomistic and gestalt theories (Gordon, 1986a). The four non-preference MAP subtests reflected views of those in the nature camp and the three preference subtests favored proponents of nurture (Walters, 1991). Gordon (1987) contended musical context was necessary for the most accurate measurement of stabilized music aptitude; therefore, MAP test content was presented through the use of pairs of original short melodic phrases in a musical context (Gordon, 1995, p. 13), which students labeled as “like”, “different”, or “in doubt”. Although raw scores might increase in subsequent years, relative standing remained constant (Gordon, 2001c). Walters (1991) asserted the stability of relative standing across all grade levels contributed to MAP’s power and validity as a measure of music aptitude. However, Zentner and Gingras (2019) claimed the high reported validity coefficient of MAP was tempered by the association of the researchers (Gordon or his students) and the means STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 44 of publication (a publishing house whose standards may not have been stringent). Nevertheless, two unique features of MAP contributed to its high level of validity. Listeners had the option to select “same”, “different”, or “?” in response to each MAP test prompt. The “in-doubt” option was intended to prevent guessing (Gordon, 1995, p. 87), which increased the validity of the subtest (Gordon, 1998, p. 102). In addition, test items of various difficulty were scattered randomly throughout each subtest in order to maintain interest and deter frustration of students with low or average music aptitude levels (Gordon, 1998, p. 102), which offered an additional boost to validity. Norms based on a sample size of 12,809 students from 20 school systems in 18 states (Gordon, 1995, p. 69) were reported for composite scores of each music aptitude test (tonal, rhythm, musical sensitivity), as well as for each subtest of MAP. Gordon (1998) noted percentile rank norms and score distributions were not markedly different for students of different geographical regions, school settings, or cultures (p. 11). MAP had been shown to be a valid measure for minoritized students (Gordon, 1980b), students in Finland (Sell, 1976), Germany (Schoenoff, 1973; Schoonover, 1974), South Korea (Reynolds & Hyun, 1994), and Taiwan (Chuang, 1997), students with special needs (Curtis, 1981), and students with intellectual giftedness (Drennan, 1984). However, its efficacy in measurement of stabilized music aptitude in college and university students had not been substantiated (Gordon, 1990c). MAP, with its seven subtests, had been used in numerous research studies for its ability to diagnose strengths and weaknesses of individual students (Geissel, 1985, p. 4). Gordon reported split-half reliability coefficients of .90–.96 (composite) and criterion validity of .73 (composite) when MAP scores were compared with scores of a music achievement test (Haroutounian, 2002, p. 293). STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 45 Although not unique in its inclusion of preference subtests, Stevens (1987) noted MAP was the sole American test battery to include preference tests (p. 18): MAP’s inclusion of musical sensitivity tests was relatively rare. Gordon (1987) found it unnecessary to group musical sensitivity test questions in pairs: musical phrases in preference subtests did not require concentrated attention, as students needed only an overall impression of the two test items in order to decide which was preferred (p. 65). Timbre and dynamics were not included as separate subtests; rather, aspects of timbre and dynamics were included in two preference subtests as an intentional design feature to contribute to “a Gestalt of music phrasing and style” (Gordon, 1981). Gordon expressed confidence that formal music instruction contributed sparingly to student responses on preference subtests, as the majority of students had no additional formal training other than general music class. Yet student responses aligned with those made by professional musicians, which lent support to Gordon’s (1998) contention that professional musicians’ decisions were associated more closely with factors of music aptitude than music achievement (p. 100). Gordon (1986a) concluded musical preference was a factor of stabilized music aptitude, and asserted the three preference subtests contributed significantly to MAP’s longitudinal predictive validity coefficient of .75 (Gordon, 1998, p. 61). Due to the extensive reporting of validity associated with Gordon’s tests, Law and Zentner (2012) selected MAP and AMMA for use in their examination of congruent validity of researcher-designed Profile of Music-Perception Skills (PROMS). However, the ability of MAP and AMMA to measure isolated audiational skills was questioned. Law and Zentner reported correlations of AMMA melody scores with other test components, and noted the emphasis on tonal memory required by the MAP tempo test, as it required the listener to retain the ending of the melodies in order to compare and judge them as same or different. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 46 PMMA Gordon’s Ottumwa study, conducted during the development of MAP (Gordon, 1995, p. 33), yielded an ostensibly reliable measure of aptitude in students in kindergarten and first grade, but only through extraordinary measures (Walters, 1991), such as selected inclusion of children with exemplary achievement, one-on-one administration by a parent, and markedly increased administration time. The development of the Primary Measures of Music Audiation (PMMA; 1979b) was the result of the continued investigation of inconsistent reliability coefficients yielded in studies of MAP administered to early elementary-aged children (DeYarman, 1972, 1975; Harrington, 1969; Schleuter & DeYarman, 1977). From these findings, Gordon inferred a need for the creation of a more appropriate measure for use with children younger than 9 years. Gordon (1979a) benefited from the experimentation of other researchers, who modified the length and format of MAP’s answer sheets and recorded directions (DeYarman, 1972, 1975; Harrington, 1969; Schleuter & DeYarman, 1977). Norms for students in Grades 4–12 were included in the original 1965 publication of MAP; Harrington (1969) adapted three of the seven MAP subtests, including one preference subtest (phrasing), by simplifying and re-recording test directions and color-coding answer sheets for use with a younger age group. Harrington calculated reliability coefficients for MAP scores of students in Grades 2 and 3, and concluded subtest scores demonstrated satisfactory reliability and test results appeared to be more closely related to musical ability than scholastic ability. Therefore, he concluded the primary version of MAP functioned adequately in measuring music aptitude of young students. However, Gordon attributed the low composite reliability coefficient in Harrington’s study to the expected fluctuations of music aptitude in young children and thereby drew a different conclusion: the STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 47 primary MAP was not measuring stabilized music aptitude in second- and third-grade students— it was revealing developmental music aptitude (DeYarman, 1975). Using adapted test directions, different answer sheets, and the same music included in Harrington’s 1969 study, DeYarman (1972) conducted an investigation of the stability of music aptitude of kindergarten and first-grade children using his own version of the primary MAP. DeYarman reported higher MAP composite reliability coefficients for his early primary sample than those reported for Harrington’s sample of second and third grade students. DeYarman (1975) reasoned reliability coefficients of MAP were as high for students in kindergarten–first grade as Gordon had reported for students in Grades 4–12, thus, music aptitude must stabilize as early as age 5 or 6. Using this rationale, DeYarman conducted a subsequent study in 1975 to further investigate music aptitude stability in primary-age children. The study’s research questions addressed stabilization of music aptitude and the effects of formal instruction on music aptitude levels before Grade 4. However, the sample did not include primary-aged students; rather, approximately 3,000 fourth-grade students constituted the sample. Zimmerman (1986) noted difficulty determining how MAP results of Grade 4 students could lead to a conclusion of music aptitude stabilization of students aged 5 or 6. That DeYarman’s conclusion was controversial and specious, given that his sample consisted solely of students in fourth grade, must be considered. Nevertheless, DeYarman’s (1975) research design in which correlations of music aptitude test scores from a sample of fourth grade students were used to draw conclusions about the onset of stabilized music aptitude was most efficacious and served as the template for future studies: to note when music aptitude had stabilized rather than to attempt to observe when developmental music aptitude had ceased was reasonable and prudent, given the potential STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 48 complexity of tracking the expected fluctuation of the relative standing of students in the developmental music aptitude stage. A follow-up study by Schleuter and DeYarman (1977) replicated and extended DeYarman’s 1975 study. The MAP scores of fourth-grade students were compared to scores of DeYarman’s 1975 sample. Schleuter and DeYarman (1977) concluded from their findings that formal music instruction in the primary grades had little effect on student music aptitude levels; thus, music aptitude must stabilize in or before kindergarten. The discrepancies between the findings of DeYarman’s and Harrington’s studies caused Gordon to reconsider his earlier agreement with DeYarman’s assertion that music aptitude stabilized in the primary grades (DeYarman, 1975). Gordon’s theory of developmental music aptitude resulted from his extensive research involving approximately 10,000 primary students and his understanding of language development and music perception. Neurological information supported the construct of developmental music aptitude, as did research findings that indicated the effect of environmental factors on music aptitude levels (Gordon, 2005), and substantiated developmental and stabilized music aptitude as separate constructs through factor analysis of MAP, IMMA, and PMMA scores (Gordon, 1986a). Gordon contended audiation of keyality and tempo were functions of developmental music aptitude. Thus, discrimination of PMMA and IMMA test item pairs emphasized keyality and tempo, rather than tonality and meter as in MAP (Gordon, 1986c, p. 100), further establishing IMMA as a measure of developmental music aptitude. It was difference that contributed heavily to the validity of music aptitude tests: Gordon (2002), in his examination of the item content of PMMA and IMMA, found total score reliability was similarly high for 20 patterns with “different” as the correct answer as when all 40 test items STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 49 were included. Gordon (1981) concluded children in the developmental music aptitude stage were more aware of how music was constructed than of its expressive qualities and were able to recognize extremes of timbre and dynamics only. Therefore, the content of the three MAP preference subtests was unusable for measuring the aptitude of young children (Walters, 1991). Consequently, PMMA contained only non-preferences subtests. The design of PMMA was unique. Students in the developmental music aptitude stage had difficulty distinguishing tonal features from rhythm features if the two were conflated within melodic patterns. Therefore, the test items of developmental music aptitude measures contained tonal patterns without rhythm and rhythm patterns without pitch (Gordon, 2004). Within the separate tonal subtest and rhythm subtest, students were tasked with listening to a pair of tonal patterns or rhythm patterns to determine if the pair was “same” or “different”; research findings since the publication of PMMA and IMMA indicated an increase in test reliability when “not same” is used in place of “different” (Gordon, 2002). Answers were marked by circling the box containing two identical faces or that containing two dissimilar faces on the answer sheet (see Figure 1); therefore, reading, writing, speaking English, performing, and understanding music theory were unnecessary for accurate administration of PMMA. Because young children were inclined to have their attention diverted from the music itself toward the source of the sound, test content for students in the developmental music aptitude stage was performed on electronic instruments, in contrast to the acoustical instruments used in stabilized music aptitude tests (Gordon, 2004). The test content was unfamiliar and presented without musical context, and the interval of time between test items was insufficient for students to memorize or fully recall each pair of patterns. Thus, Gordon mitigated the effects of prior learning, resulting in a test of music aptitude rather than music achievement (Walters, STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 50 1991). PMMA was normed for students in kindergarten through third grade, approximately ages 5 through 8 (Reese & Shouldice, 2019, p. 480): separate subtest norms as well as composite PMMA norms were reported, reflecting the atomistic and gestalt viewpoints. Figure 1 PMMA/IMMA Test Answer Sheet Design (Gordon, 1986c) Moore (1990) summarized subsequent research findings (Flohr, 1981; Moore, 1987; Norton, 1980) that supported the theory of developmental music aptitude as a stage defined by fluctuation and influenced by instruction. Moore noted the comprehensive body of research conducted by Gordon (1979b, 1980b, 1981, 1986a) to establish PMMA as a measure of music aptitude. Gordon’s 1982 study of longitudinal predictive validity of PMMA yielded a reliability coefficient of .73 when pre-instructional PMMA scores and judges’ performance ratings were correlated (Geissel, 1985, p. 12). Flohr (1981) found a significant effect of short-term music instruction on PMMA scores (Reese & Shouldice, 2019, p. 478); Gordon (1980b) concluded specialized music instruction focused on tonal and rhythm patterns resulted in high PMMA scores. Walters (1991) noted evidence from subsequent studies indicated music aptitude was unstable during the primary years, PMMA could be used to predict achievement, and STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 51 developmental aptitude was sensitive to instruction, especially for students with low music aptitude. In addition, Bell (1981) found PMMA to be a valid measure for use with children with developmental disabilities. IMMA The Intermediate Measures of Music Audiation (IMMA) (Gordon, 1982), was developed for use with groups in which at least half of the students scored above the 80th percentile on one or both PMMA subtests (Gordon, 1986c, p. 120), as in Phillips and Aitchison’s (1997) study in which PMMA was administered to a sample of third-grade students because Gordon’s criterion not been met. Thus, IMMA, like PMMA, was a test of developmental music aptitude. However, scores from PMMA and IMMA should not be compared because the difficulty level of test content was not equivalent (Gordon, 1986c, pp. 66–67). Gordon (2005) had observed PMMA was not complex enough for young students who had received superior music instruction: the normal distribution of their scores skewed to the left and reliability decreased. Gordon (2005) designed IMMA, a more advanced test battery, with those students in mind. Although originally normed for students in Grades 1–4 (Geissel, 1985, pp. 2–3), the age range for IMMA was later expanded to include students in Grades 5 and 6 (Gordon, 1986c, pp. 64–65). The format of IMMA was identical to that of PMMA: students were tasked with determining sameness or difference of two tonal patterns or two rhythm patterns, and tonal subtest, rhythm subtest, and composite scores were yielded. Gordon (1976) developed a taxonomy of tonal patterns and rhythm patterns based on their audiation difficulty level (easy, medium difficult, and difficult). The difficult tonal patterns and rhythm patterns were the sole test content for the Intermediate Measures of Music Audiation (Gordon, 1982); IMMA was therefore a more discriminating measure of high aptitude for STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 52 children aged 6 to 9 years than Gordon’s previous test of developmental music aptitude for primary students, PMMA (Gordon, 1984a; Haroutounian, 2002; Moore, 1990; Walters, 1991). Gordon (2006) noted the choice to use PMMA or IMMA at a particular grade level was dependent on desired difficulty of test content, as both were tests of developmental music aptitude. IMMA had been used in numerous studies with students under age 9 for increased discernment (Culp, 2017; Gromko & Russell, 2002; Guilbault, 2004; Kratus, 1994; Rutkowski, 2015; Rutkowski & Miller, 2003b; Saunders & Holahan, 1993), and had been found to have strong predictive validity for instrumental and vocal achievement of fourth grade students (Geissel, 1985, p. 11). Gordon (1986c) noted a .90 longitudinal correlation coefficient for IMMA and MAP composite scores, from which he concluded IMMA assessed in the developmental music aptitude stage what MAP assessed in the stabilized music aptitude stage (p. 17). A curious feature of IMMA was its apparent capability of simultaneously functioning as a measure of developmental music aptitude and of stabilized music aptitude in students age 9 (Grade 4). Gordon (1989c), finding a .89 correlation between pre- and post-instruction IMMA scores, hypothesized that the immutable nature of IMMA scores for fourth- and fifth-grade students suggested IMMA functioned as a test of stabilized music aptitude for students aged 9 and older. Gordon (1984a) conducted a longitudinal predictive validity study in which judges’ evaluation scores of fourth-grade boys’ violin and recorder performances were correlated with pre- and post-training IMMA scores. He concluded the student sample was in the stabilized music aptitude stage, based on the high correlations between students’ pre-training and posttraining IMMA scores and achievement ratings, and pre-training IMMA scores with posttraining IMMA scores, as only low to moderate correlations had been observed in previous studies investigating the relationship of PMMA scores from one semester to another. Gordon STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 53 (1984a) observed that, despite test content designed to measure developmental music aptitude, IMMA scores could be assumed to reasonably function as stabilized music aptitude scores for students age 9 and older. Nevertheless, Gordon acknowledged the skewed distribution of scores in his longitudinal predictive validity study of IMMA, due to the small number (N = 33) and homogeneity of subjects, which seemingly lowered the reliabilities compared to those published in the IMMA manual. A profusion of high scores on the post-training IMMA were attributed to the subjects’ approaching fifth grade, which belied the theory that stabilized music aptitude was reached by age 9 and resistant to the effects of training, instruction, or chronological age. Reese and Shouldice (2019) asserted IMMA was suitable as a test of stabilized music aptitude for children as old as 11 years when use of MAP was not possible (p. 480). Gordon (1984a) promoted the screening properties of IMMA as a means of identifying students to whom the more comprehensive Musical Aptitude Profile should then be administered. Although Gordon (1993) established IMMA as a test of developmental music aptitude and identified students in Grades 5 and 6 as definitively in the stabilized music aptitude stage, he noted use of IMMA was suitable if insufficient time precluded administration of a stabilized music aptitude test (p. 245). Geissel (1985) claimed it was common practice to administer IMMA to fourthgrade students, in addition to MAP: IMMA scores were used to identify students with high music aptitude, who were then encouraged to participate in specialized music instruction, whereas MAP scores were used to diagnose strengths and weaknesses to be addressed during instruction (pp. 3–4). Walters (1991) concurred, acknowledging the suitability of IMMA as a predictor of music achievement for fourth graders and its shortcomings as a diagnostic tool when compared to MAP. In fact, when MAP’s purpose was identification rather than diagnosis, Gordon (1968) suggested one or more of MAP’s subtests (balance or meter) need not be administered. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 54 Nevertheless, these omissions were not recommended, as the savings in time would be minimal and not worthy of the more cursory diagnosis. Definitively determining when students had reached the stabilized music aptitude stage could be an elusive prospect, however. Numerous researchers (DeYarman, 1975; Harrington, 1969; Schleuter & DeYarman, 1977; Stevens, 1987) had attempted to design studies to establish when students’ music aptitude levels ceased to fluctuate, thereby indicating a level of stabilized music aptitude. However, low reliability of measures when administered to younger students confounded researchers: Wing published norms with questionable reliability for children aged 8 and younger, and Seashore asserted measures were not reliable until age 10 (Walters, 1991). Moore (1990) summarized extant research of Deutsch (1982), Gordon (1971), and Mark (1986), noting music aptitude may cease to develop beyond age 9 or 10. Others asserted music aptitude stabilized much sooner. DeYarman (1975) concluded music aptitude stabilized by age 6 or sooner, contradicting previous findings of fluctuating music aptitude in primary-grade students. Although Gordon had specified the stabilized music aptitude stage begins at approximately age 9, IMMA had been used repeatedly as a measure of developmental music aptitude for students in Grades 4 and 5. In his 1998 factor analysis of fourth- and fifth grade students’ Harmonic Improvisation Readiness Record (HIRR), IMMA, and MAP scores and improvisation performances, Gordon (1998) noted in the fourth-grade analysis, IMMA loaded on factor II, and HIRR and improvisation on factor I. However, in the fifth-grade analysis, the opposite occurred: MAP loaded on factor I, and HIRR and improvisation ability loaded on factor II. IMMA was thus selected and functioned as a measure of developmental music aptitude for students in Grade 4, presumably age 9, thus blurring the delineation of age of onset of stabilized music aptitude (p. 72). The results of Gordon’s 1984a longitudinal predictive validity study of STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 55 IMMA, described previously in this chapter, provided another example of the obscured age of onset of stabilized music aptitude. Similarly, Levinowitz and Scheetz (1998) claimed developmental music aptitude became less unstable as children near age 9. As the predictive validity of the Instrument Timbre Preference Test (ITPT) and MAP, a test of stabilized music aptitude, had been investigated previously (Gordon, 1986b), Gordon (1989c) also investigated the predictive validity of ITPT and IMMA. Thus, IMMA was administered specifically as a test of developmental music aptitude. Students were administered IMMA in fourth grade, and again in fifth grade, when the students were able to elect to study a musical instrument, some according to their ITPT results. Belczyk (1992) conducted a similar study investigating predictive validity of ITPT and IMMA. Although his sample of 805 fourth-grade students was administered IMMA, it was not indicated whether IMMA was intended to function as a test of developmental or stabilized aptitude. Perhaps the use of IMMA as a test of developmental aptitude with students aged 9 and higher was a reflection of Gordon’s (1989c) perception that “developmental and stabilized music aptitudes are more a matter of attributes of the mind than of the properties of a test” (p. 12). From examination of regression analysis results, Stevens noted each age level’s scores were significantly and progressively higher than the scores of the preceding age level until the age of 9, after which significant increase discontinued (Stevens, 1987, pp. 115–116). From these gain scores, Stevens concluded the stabilized music aptitude stage began near the age of 9: a similar conclusion to that of Gordon, but arrived at through different means. Culp (2017) observed that researchers, influenced by previous research findings, differed in their approach to the construct and measurement of music aptitude, which in turn affected their inferences. Stevens (1987) was influenced by DeYarman’s (1975) conclusion that music aptitude could stabilize as early as age 6; Gordon (1980a, 1986c, 2012) theorized music aptitude STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 56 stabilized around age 9, based on the research of Stanton and Koerth (1933), Wing (1968), and his own 1967 study (Culp, 2017). Although Stevens and Gordon both defined music aptitude as the potential to draw generalizations in music and viewed music aptitude testing as a valuable tool to collect data for individualization of instruction, Stevens (1987) viewed the construct of music aptitude as an ability to aurally perceive music (p. 45). This contrasted somewhat with Gordon’s more comprehensive view of music aptitude as an extension of audiation, as evidenced in his empirical research findings. Adhering to the prevailing premise that cessation of change to relative standing of music aptitude scores was the most effectual means to determine the onset of stabilized music aptitude, Stevens (1987) sought to establish the construct validity of her researcher-designed music aptitude test by analyzing the composite test scores of children at each age level (pp. 115–116). Gordon (1984a) reported a range of predictive validity coefficients of .55 to .70 for IMMA, and asserted composite IMMA scores predicted first and second semester instrumental and vocal achievement at only a slightly lower rate than MAP; he found a substantial relationship between achievement in instrumental performance and in vocal performance. IMMA’s predictive validity coefficients were unusually high when compared to its concurrent validity coefficients. In addition, Gordon (1989c) estimated a predictive validity coefficient of .80 for IMMA when combined with the Instrument Timbre Preference Test after two years of instrumental instruction. Gordon, (1986a) conducted a factor analysis of MAP, PMMA, and IMMA (N=110 fourth-grade students) and reported the following results: Factor I (unrotated analysis) and Factor II (rotated analysis) were identified as stabilized music aptitude factors. Factor II (unrotated analysis) and Factor I (rotated analysis) were identified as developmental music aptitude factors. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 57 Factor III was identified as stabilized music aptitude preference factors for both analyses. Gordon concluded IMMA could serve as a test of developmental and stabilized music aptitude; therefore, developmental and stabilized music aptitude were different. Nevertheless, the congruent validity of IMMA as a test of stabilized music aptitude was in question. Carroll (1978) summed up the premise of correlation, noting if two measures were significantly associated, they might be regarded as “measuring the same thing” (p. 89). Gordon (1986c) concurred: if correlation affirmed both tests were valid for their intended purpose, the tests exhibited congruent validity (p. 109). Gordon (1986c) reported acceptably high correlation coefficients for PMMA and IMMA (Grades 1–4, N = 126), IMMA and MAP (Grade 4, N = 92), and PMMA and MAP (Grade 4, N = 227) (p. 111), and concluded PMMA, IMMA, and MAP measured similar content and constituent characteristics. Although IMMA was found acceptable as a test of stabilized music aptitude for students ages 9 and up, the congruent validity of IMMA as a test of stabilized music aptitude with MAP was cursory and inconclusive. IMMA was not a newlydeveloped test, but instead demonstrated established validity for a different purpose (as a test of developmental music aptitude). MAP had proven longitudinal validity but not parallel content. Therefore, congruent validity for IMMA and MAP could not be established through traditional means. It could be concluded PMMA, IMMA, and MAP exhibited congruent validity only if “the same factor” they measured was music aptitude in general rather than developmental or stabilized music aptitude. AMMA Gordon (1989b) originally developed the Advanced Measures of Music Audiation (AMMA) as a more complex measure of stabilized music aptitude for students in college or university; subsequently, norms for high school and junior high students were added. Through STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 58 use of melodic test prompts within a single non-preference test, tonality, keyality, melody, implied harmony, rhythm, meter, and tempo can be audiated concurrently. In a stabilized music aptitude test such as MAP, students heard melodic patterns and discerned sameness or difference of tonal or rhythm dimensions in separate subtests. However, in an advanced stabilized music aptitude test such as AMMA, intended for administration to “mature students,” the student heard melodic patterns and discerned sameness or difference of tonal and rhythm dimensions simultaneously within a single test (Gordon, 2004). Gordon (1998) candidly observed he was unable to explain why test design differences were necessary, only that he had arrived at this conclusion through extensive research of test development, reliability, and validity (p. 112). Because of its uniquely compact design, AMMA could be administered in approximately 20 minutes (Gordon, 1989b), compared to the estimated 50 minutes required for administration of each MAP test (tonal, rhythm, and musical sensitivity) (Gordon, 1995, p. 36). As with all of Gordon’s music aptitude tests, AMMA was designed to incorporate atomistic and gestalt features. However, unlike MAP, AMMA did not include preference subtests for measurement of stabilized music aptitude, due to its unique but labor-intensive system of scoring which allowed tonal and rhythm scores to be calculated separately from the administration of a single measure (Gordon, 1989b). Gordon published tonal, rhythm, and total percentile rank norms for students in high school, college music majors, and college non-music majors; however, Haroutounian (2002) asserted AMMA could be used to measure stabilized music aptitude for students above Grade 4 (p. 16). Concurrent validity with MAP was the initial means of validity established for AMMA. Gordon (1989b) reasoned a moderate correlation coefficient of MAP and AMMA scores would substantiate the validity of AMMA and promote design of additional test validity studies. An STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 59 investigation of the longitudinal predictive validity of AMMA was undertaken by Gordon in 1990. Total AMMA scores and etude performance ratings of college music students were correlated, yielding a longitudinal predictive validity coefficient of .82 and suggesting AMMA had a high degree of ability to predict performance achievement in college-age students: Gordon (1990c) found more than 67 percent (the square of .82) of the reason or reasons for college students’ success in music performance could be predicted by the total test scores on AMMA. Gordon (1998) noted a high intercorrelation between AMMA tonal and rhythm test scores, likely because questions with “same” as the correct answer were included in calculating both tonal and rhythm scores (p. 114). Although MAP offered a more comprehensive diagnosis, it was not consistently more valid; therefore, teachers might opt to administer AMMA to measure aptitude levels of students in Grades 7–12 because of AMMA’s considerably shorter length (Gordon, 1998, p. 112). Audie Audie (Gordon, 1989a), a “game” of music aptitude for children ages 3 and 4, was designed for individual administration by a parent. Although Audie was a test of developmental music aptitude, Gordon (2005) asserted the need for preschool children to be presented with melodic test prompts, even though they were able to address only the tonal or the rhythm aspects of the test question. Therefore, test prompts in Audie consisted of a single melody, “Audie’s song”, from which students must discriminate a tonal difference or a rhythm difference in the test responses of the tonal and rhythm subtests, respectively. A hallmark of Gordon’s music aptitude tests was the absence of required reading, writing, English language speaking, and knowledge of music theory, as these skills were superfluous in Gordon’s test design. Children only needed to comprehend if a given tonal or rhythm pattern was the same as or different than “Audie’s song” STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 60 and indicate their answer to the parent, who marked “yes”, “no”, or “?” accordingly on the game’s answer sheet. Children were encouraged to play Audie repeatedly and independently. After the parent had determined full engagement in the game, the child’s answers were marked on the answer sheet. Either the tonal or rhythm subtest could be administered first. The parent was encouraged to monitor recurrent results in order to observe growth and adapt instruction for individual differences (Gordon, 1989a). Gordon found a typical child’s level of concentration was limited to, at most, ten consecutive questions. Because the test was so brief, it was not useful to calculate percentile norms. Instead, criterion validity was used to interpret scores. Features of Stabilized Music Aptitude Chronological Age Effect The influence of chronological age on music aptitude received limited attention in extant literature. Walters (1992) contrasted maturational readiness (chronological age) with experiential readiness for music learning, which Gordon labeled “musical age”. Runfola and Etopio (2010) further defined musical age: if developmental age was a measure of children’s physical and psychological development, musical age was a measure of children’s musical development, in which they progressed through preparatory audiation and audiation over time and under different environmental circumstances. Radocy and Boyle (1979) viewed maturation as a reinforcer of inherited musical potential, rather than an influence on music aptitude. Gordon (2001c) attributed changes in MAP scores to general maturation, rather than to the effect of formal instrumental instruction; changes were too small to affect practical significance. Chronological age was also associated with score increase on developmental music aptitude tests and tests of stabilized music aptitude (Gordon, 2005); however, maintenance of relative position in score distributions distinguished the stabilized music aptitude stage from the developmental stage. Gordon (2002) STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 61 posited operation of different stages of music aptitude associated with chronological age were suggested by substantial differences in paired PMMA and MAP scores of the same Grade 2 and Grade 6 student sample. Thus, music aptitude stage was perceived as more reliable than tonal or rhythm content in differentiating between students of different chronological ages. In his 1989 study of the effect of chronological age on AMMA scores, Gordon found the means and standard deviation of children’s chorus participants aged 9–12 almost identical to data from college-aged non-music majors from the IMMA standardization program. Thus, an effect of chronological age on music aptitude was discounted. Resistance to Instruction and Maintenance of Relative Standing The manner in which music aptitude was historically defined would be classified currently as stabilized music aptitude. Moore (1987) noted some scholars characterized music aptitude as a series of fixed and unchangeable traits. This constancy, manifested as resistance to direct instruction, training, or practice, was a hallmark of stabilized music aptitude. Gordon (1987) was resolute in his characterization of music aptitude as immutable after approximately age 9 (p. 9). Gordon (1995) based this conclusion on the findings of his 1967 study of predictive validity of MAP, in which a negligible discrepancy between the mean difference of MAP scores of band members and those in the “musically select” standardization sample over a 3-year period was found (p. 97). Fosha (1964) administered MAP to elementary and secondary instrumental and choral ensemble participants before and after one semester of formal music instruction, and concluded from the slight differences in pre- and post-instruction scores that MAP scores of musically select students were resistant to the influence of instruction. Gordon (1989c) compared the pre- and post-instruction IMMA scores of approximately 170 elementary students in a 2-year predictive validity study of ITPT and IMMA and found additional support for his assertion that STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 62 IMMA scores of students in Grades 4 and 5 were not affected by music instruction. Previous and subsequent studies yielded similar results, thus indicating resistance to instruction was the threshold for the stabilized music aptitude stage (DeYarman, 1975; Gordon, 1981; Mang, 2013). Haroutounian (2002) observed future training can enhance but not extend music aptitude (p. 11). Relative standing, marked by increases in raw scores and steady percentile ranks (Gordon, 2001c), was maintained on all MAP subtests throughout the 3-year longitudinal predictive study of MAP (Gordon, 1981). Scores of subsequent administrations of the same stabilized music aptitude test yielded high correlations, whereas the correlation for scores on the same developmental music aptitude test administered in subsequent years was low (Gordon, 2005). In addition, content, or “what is audiated”, was indifferent to instructional procedure (Gordon, 1981), although considerably more fluctuation of developmental music aptitudes occurred when instruction was modified to align with PMMA results. Stanton (1935) claimed stability of music aptitude for participants of a longitudinal predictive validity study of the Seashore measures, as cited by Gordon (1981). Nevertheless, Gordon (1995) later observed individualized instruction should result in meaningful improvement in students’ music achievement, though not an improvement in stabilized music aptitude. Primacy of Rhythm Of interest was the prominence of rhythm in stabilized music aptitude referred to throughout the literature. Gordon (2005) noted the combination of meter and tempo aptitudes could predict school music success more accurately than melody and harmony together, and contended rhythm was the basis of music aptitude, as it served as the foundation for musical style and expressiveness (Gordon, 1998, p. 60). Although Gordon (2005) asserted higher predictive validity for meter than for tempo, he concluded tempo was the most fundamental of STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 63 rhythm aptitudes (Gordon, 1998, p. 104). In addition, the MAP meter subtest was found to be a more valid measure of rhythm aptitude than one in which phrases of melodic rhythm were compared (Gordon, 1998, p. 104) and had a high, but unlikely, relation to the musical sensitivity–balance subtest (Gordon, 1998, p. 55). Gordon (1986a) concluded the IMMA rhythm subtest may be more suggestive of stabilized music aptitude than of developmental. Even so, an adjustment to the MAP tempo subtest was made (dynamic accents or clicks, representing macrobeats, underlaid the test prompts) in order to increase reliability, as was also necessary in the rhythm subtests of PMMA and IMMA. Rhythm reacted the most robustly on factors of developmental and stabilized music aptitude in Gordon’s (n.d.) investigation of HIRR, PMMA, IMMA, MAP, and improvisation ability (Gordon, 1998, p. 173). Therefore, Gordon (1998) claimed knowledge of chord changes in syntactic time might be more important to the audiation process than awareness of chord changes (p. 172). This construct aligned with Karma’s perspective on the temporal aspect of music aptitude, manifested in his focus on auditory structuring as a measure of music aptitude. Music Preference Bugos et al. (2014) noted the perception of expressive characteristics was viewed as the most important element of musical performance by musicians and educators. Gordon (1980b) contended musical expression was another key feature of stabilized music aptitude, and asserted the expressive dimension of stabilized music aptitude joined the tonal and rhythm dimensions to yield comprehensive music aptitude (Gordon, 1998, p. 60). Music sensitivity or preference as a construct seemed to be generally accepted (Boyle, 1992; Bugos et al., 2014), yet there was little agreement on how it should be measured. There was even a lack of consensus of definition and terminology in assessing musical sensitivity (Boyle, 1982). Boyle accepted Kuhn’s (1979) STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 64 definition of preference as “an act of choosing, esteeming, or gaining advantage of one thing over another through verbal statement, rating scale response, or choice made from among two or more alternatives” (Boyle, 1982), yet Geake (1999) considered musical sensitivity independent of music aptitude, and attributed both auditory discrimination and affective response to musical sensitivity. Colwell instead used the term “stylistic discrimination”, as cited in Boyle (1982). Boyle (1982) specified perception and response to sensory stimuli was measured in preference tests. Listeners of musical preference tests were frequently asked to indicate which music fragment of a pair were preferred; some, like the Indiana–Oregon Music Discrimination Test (1965), required the listener to identify whether the change was melodic, harmonic, or rhythmic. Based on the findings of his research into audiation of “same” and “different” in developmental music aptitude, Gordon (1986c) noted older children attended more to musical characteristics in test items (p. 103). In contrast, the inclusion of the dimensions of timbre and dynamics in PMMA, a test of developmental music aptitude, resulted in reliability approaching zero (Gordon, 1981). Therefore, tests of musical expression were included in MAP but not PMMA or IMMA (tests of developmental music aptitude), as young children were focused primarily on the constructive elements of music (Gordon, 1980b) and incapable of reliably making judgments about their music preferences (Gordon, 2002). Gordon (1989b) theorized the subjective understanding inherent in music aptitude was manifested as musical sensitivity and measured by preference tests; Boyle (1982) noted correlations between sensitivity subtests and performance evaluations and music achievement test scores yielded high validity coefficients. Interestingly, a paper/pencil version of the Musical Nuance Test created by Bugos et al. (2014) indicated an increase in musical nuance perception only up to age 10. The inclusion of expression subtests only minimally increased the longitudinal predictive power of the AMMA STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 65 total score (Gordon, 1990c), thus Gordon opted not to include music preference subtests in AMMA. Music preference subtests were also excluded from the ancillary study undertaken concurrently with Gordon’s 1989 predictive validity study of IMMA and ITPT, in which MAP tonal imagery and rhythm imagery subtests were administered to sixth-grade students who had taken IMMA in Grade 4 (Gordon, 1990c). Since IMMA did not include music preference subtests corresponding to the MAP musical sensitivity subtests, MAP expression subtests were disregarded. Thus, little evidence in published research was found to compare the effect of music preference subtests on stabilized music aptitude measurement. Nevertheless, reliability coefficients were moderately high for some tests of musical sensitivity. Although Heller reported low test-retest reliability coefficients of .28 and .50 and a split-half coefficient of .42 for college students (Boyle, 1982), Wing reported a reliability coefficient of .84 for 15-year old boys. Gordon reported reliability coefficients ranging from .84 to .90 for the MAP musical sensitivity test; Boyle’s 1982 investigation of the comparative validity of Wing’s Standardised Tests of Musical Intelligence, MAP, and the Indiana–Oregon Music Discrimination Test yielded split-half reliability coefficients of .88 (Grades 10–12) and .82 (college students) and low correlation coefficients, indicating each test measured something other than that measured by the other tests. Thus, Gordon (2004) asserted the high predictive validity of music preference subtests established their importance as components of stabilized music aptitude tests. Ensemble Participation Effect Little research was found on the effect of ensemble participation on music aptitude as well. Gordon (1998) found ensemble participants as a group scored higher on MAP than nonparticipants (pp. 79–80), and reported higher composite MAP scores for ensemble students STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 66 (Gordon, 1987, p. 83). Nevertheless, lack of ensemble participation was not a limiting factor for high music aptitude scores, nor did ensemble participation guarantee high music aptitude scores, as evidenced by a comparison of percentile norms reported in the MAP manual (Gordon, 1998, p. 80). Gordon (1998) found a minimal increase in MAP scores for students studying a musical instrument when compared to students without that training (p. 106), and reported more ensemble participants scored higher on MAP than did nonparticipants (Gordon, 1995, p. 31). Musically select students demonstrated small gains in pre- to post-test MAP scores in Fosha’s (1964) study of students in Grades 4–12 (p. 87). Regrettably, the discrepancies in MAP scores between participants and non-participants in school music ensembles might be attributed to the selectivity of music performance groups. Gordon (1995) studied MAP scores of students in performance groups and non-participants to determine whether ensemble participants were likely to have higher music aptitudes (p. 90). Although non-participants scored somewhat lower, many high-scoring students were not engaged in school music activities (p. 91). The value of MAP scores (and by extension, IMMA scores) in identifying students with high music aptitude for potential enrollment in school music ensembles cannot be underestimated (p. 91). Scores of all Grade 4–12 students in the MAP standardization program were used to determine the need for separate norms for chorus, band, and orchestra rather than one set of norms for all musically select students (Gordon, 1995, p. 90). Gordon found similarities in score distributions for chorus, band, and orchestra members at each grade level and seemed to conclude separate norms by instrument family were unnecessary (p. 91); however, no mention was made of how the score distribution of non-participant scores compared to those of chorus, band, and orchestra. In addition, the use of statistical testing to compare means of different STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 67 ensemble types was not referenced. Therefore, it is unknown if mean scores of one ensemble type were significantly different from another. Scores of elementary students were considered separately from those of middle and high school students in drawing conclusions about the need for separate norms for musically select students, but not for determining the effect of ensemble type on music aptitude. Gordon (1970) noted a significant relationship between MAP scores and instrumental music success, and reported students in performance groups scored higher on MAP than nonparticipants (1995, p. 91). Subjects in his 1970 investigation of music aptitude differences in Grade 4 and 5 beginning instrumental students were matched for music aptitude level, sex, grade, and type of instrument; however, “years of participation” was not included as a variable. Extant literature on the effect of ensemble participation on music aptitude was limited. The elementary students in Gordon’s (1995) MAP study were in Grades 4–6. It is unknown to what extent scores of students in each grade contributed to the conclusions drawn regarding the need for separate norms. Consequently, it is not possible to aggregate the 1995 findings for students in Grades 4 and 5 only, as constitutes the participant sample in the current study. It appears the grade level at which a student participated in a performance group had not been previously examined for its effect on music aptitude. Transition to the Stabilized Music Aptitude Stage The process of transition from developmental to stabilized music aptitude appeared to be well-defined. Gordon (1981) noted in tonal audiation, one gradually attended to pitch center first, then key, and finally mode. In rhythm audiation, the progression moved from paired beats of equal length to melodic rhythm, and finally to meter. In terms of measurement, Gordon (1989b) designed IMMA for students transitioning from the developmental to stabilized music aptitude STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 68 stage (ages 6 through 9) or those aged 10 and 11 who had already attained the stabilized music aptitude stage. Gordon (1986a) contended the effect of musical environment, measured as gain scores, decreased substantially with age until approximately age 9. Thus, it was conceivable decreasing gain scores and diminishing influence of instruction were two indicators of a transition period between the stages of developmental and stabilized music aptitude. An additional indicator was the shift of student focus from keyality to tonality and from tempo to meter, as Gordon (2002) asserted the higher a student’s tonal or rhythm music aptitude level, the sooner the student’s audiation might begin to include tonality and meter. Summary Because of the discrepancy concerning the age at which music aptitude stabilized, Schleuter and DeYarman (1977) concluded insufficient evidence was available to support Gordon’s assertion that music aptitude stabilized at age 9. Gordon concurred with Seashore’s theory that music aptitude is developmental in the early years, and asserted from findings of extensive research involving a large sample of primary-aged children, studies of neurological and language development, and music perception research that music aptitude was developmental until approximately age 9 (Walters, 1991). Gordon (2005) further supported this conclusion, noting developmental music aptitude stabilized at about age 9, which was approximately the same age at which physical changes in brain development of the frontal lobe occurred. Mang (2013), Phillips et al. (2002), and Stevens (1987) concurred. Moore (1990), summarizing the work of Deutsch (1982), Gordon (1971), and Mark (1986), concluded stabilized music aptitude began around age 9 or 10, when music aptitude might cease to improve despite further training. Other researchers differed in their conclusions of age of stabilized aptitude onset: DeYarman (1972, 1975) and Schleuter and DeYarman (1977) concluded music aptitude STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 69 stabilized as early as age 5 or 6, Forsythe (1984) found a partial stabilization of music aptitude in a sample of preschool students who participated in music instruction (p. iii), Wing published norms for students age 8 and younger (Walters, 1991), and Seashore hypothesized that music aptitude stabilized at the age of 10 (Haroutounian, 2002, p. 15). Gordon (1967b) conducted a multiple regression analysis using the grand composite performance achievement score as the dependent variable and the seven MAP subtests as independent variables; in another study, test items were factor analyzed by tonal subtest or rhythm subtest for “like” or “different” responses. Gordon (1981) found MAP item factors of students age 9 and older were more similar to PMMA item factors of students age 8 than for students age 5. Phillips et al. (2002) surmised Grade 3 (approximately age 9) was the pivotal year for development of aural skills, after which aural acuity no longer hindered accurate pitch matching, and inferred support of Gordon’s definitions of developmental and stabilized music aptitude. Thus, the grade level, as well as the age at which music aptitude stabilized, had not yet been fully substantiated. Dissension on the age of onset of stabilized music aptitude created a subsequent need for accurate measurement of the stage and level of music aptitude. So certain was Gordon (1986c) that the onset of stabilized music aptitude occurred in fourth grade that he stated IMMA would measure developmental music aptitude of students in Grades 1–3 and stabilized music aptitude of students in Grade 4 (p. 27). Moreover, Gordon (2005) suggested MAP was appropriate for students who had entered the stabilized music aptitude stage and PMMA was suitable for those who remained in the developmental music aptitude stage. Gordon’s implied recommendation that the student’s stage of music aptitude must be known in order to determine which music aptitude test was most appropriate to administer, especially in the case of the overlap of suggested grade levels for tests such as IMMA and MAP, was impractical. In his 2001–2002 STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 70 study examining the need for different music aptitude tests for developmental and stabilized music aptitudes, Gordon (2002) concluded a test of developmental music aptitude was insufficient to measure stabilized rhythm aptitude, thereby implying a need to use some music aptitude tests as preliminary measures prior to administering the optimal music aptitude test for each student. Gordon (2002) further warned use of an unsuitable test would discriminate against very high and very low scoring students and yield misleading results, particularly for older students. Gordon designed MAP, a test of stabilized music aptitude, for students as young as fourth grade; the original range of grades for administration of IMMA, a test of developmental music aptitude, was first through fourth grade. Thus, it could be implied the transition from developmental to stabilized music aptitude occurred in fourth grade. The gap noted by Moore (1987) between students age 8 (developmental music aptitude) and age 9 (stabilized music aptitude) offered additional evidence of the likelihood a period of transition for developmental and stabilized music aptitude occurred between ages 8 and 9. An examination of music aptitude test scores seemed the best means of determining if a student had fully transitioned from the developmental music aptitude stage and reached the stabilized music aptitude stage. Walters (1991) noted Gordon’s choice of the label “primary” for PMMA implied music aptitude of primary-aged children was not solidified. It was debatable which music aptitude test should be administered to fourth and fifth grade students in particular, because their chronological age (“about age nine”) suggested they might be transitioning from the developmental music aptitude stage to the stabilized music aptitude stage. Without knowing if these students had entered the stabilized music aptitude stage, were transitioning to the stabilized music aptitude stage, or had not yet left the developmental music aptitude stage, a STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 71 level of ambiguity existed regarding the appropriate music aptitude test to be administered. Gordon (2006) hypothesized, It may be middle-school represents the period of a pronounced borderline between developmental and stabilized music aptitude stages, and MAP is more appropriate for students just entering the stabilized stage and AMMA for students who have gone beyond middle-ground and already settled into that stage (p. 234). The dissenting conclusions of past researchers lent support to the theory of a period of transition from developmental to stabilized music aptitude: it was improbable that a shift between developmental and stabilized aptitude occurred in an instant, without a period of transition or staggered onset, even within one grade or age level. Instead, a continuum of music aptitude from developmental to stabilized, with decreasing score gain, decreasing fluctuation of music aptitude test score ranking, evidence of relative standing, and a progressive shift towards more fixed levels of music aptitude seemed feasible as students approached and surpassed age of 9. Gordon conjectured the transition to stabilized music aptitude might occur in stages (e.g., entering, transitioning, and settling) on or around age 9. Yet IMMA, a test of developmental music aptitude, was recommended for use with students age 10 and 11 in Grades 5 and 6, despite the assumption their music aptitudes had stabilized previously (Gordon, 2006), and MAP was advised for administration to students interested in special music studies due to its comprehensive diagnostic capabilities. One presumed fourth graders, whose music aptitudes were potentially in the process of stabilizing, might also benefit from the diagnostic capabilities of MAP if pursuing special music studies. Thus, the advantage of IMMA administration for students in fourth grade conceivably transitioning into the stabilized music aptitude stage was in need of additional investigation. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 72 Although much had been written about the measurement of music aptitude, it was apparent an examination of music aptitude tests based on a singular approach was needed in order to better establish and describe the transition from developmental to stabilized music aptitude. To study aptitude measures based on Gordon’s unified construct of music aptitude was thus optimal. It was through the examination of stabilized music aptitude onset and the practical application of the findings of this research that the significance of this study was established. Chapter 3 Methodology Introduction The purpose of this chapter was to introduce the research methodology for this study. Using a quantitative approach, the mean difference in students’ longitudinal scores was examined to determine their capacity to predict the onset of the stabilized music aptitude stage, highlight changes in IMMA scores that were suggestive of the grade level at which stabilized music aptitude begins, and establish the feasibility of a period of transition between the developmental and stabilized music aptitude stages. Descriptions of the sample, sampling technique, measures, research design, and data analysis method are the primary components of this chapter. Research Questions and Research Hypotheses The following is a summary of this study’s research questions and corresponding research hypotheses: Research Question 1: At what grade level does chronological age cease to affect student music aptitude? STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 73 Research Hypothesis 1: Although raw scores may continue to increase as chronological age increases, significant score increase will decline, then cease at approximately age 9; no effect of chronological age on music aptitude is expected. Gordon (1989b) stated unequivocally: “Chronological age has very little effect on test results” (p. 43). He noted raw scores were expected to increase as chronological age increased, regardless of the stage of music aptitude; however, the same relative position in the score distribution would be maintained by students in the stabilized music aptitude stage only (Gordon, 2005). It was expected IMMA subtest scores from this study would fluctuate between Grades 3–5; nevertheless, statistical significance of mean score difference served as the standard against which influence of chronological age would be interpreted. Research Question 2: At what grade level does instruction cease to affect student music aptitude? Research Hypothesis 2: An influence of instruction, measured as significant IMMA score difference will cease at approximately age 9; no effect of instruction on music aptitude is expected. The effect of instruction on music aptitude scores was used historically to define stabilized music aptitude (DeYarman,1972; Harrington, 1969; Seashore, 1919; Schleuter & DeYarman, 1977; Stevens, 1987). Using this definition, identification of the grade level at which instruction ceased to affect music aptitude scores would help determine the age of onset of stabilized music aptitude. Research Question 3: Is there evidence to substantiate the transition between the developmental music aptitude stage and stabilized music aptitude stage at age 9/Grade 4? STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 74 Research Hypothesis 3: Significant score fluctuation will occur throughout Grade 3, begin to decline during Grade 4, and discontinue in Grade 5; a period of transition is expected. Gordon proposed a shift from developmental music aptitude (ages 6–9) to stabilized music aptitude (ages 10 and 11) (1989b), and speculated middle school might serve as a definitive boundary between the developmental and stabilized music aptitude stages (Gordon, 2006). Prior to gathering evidence that IMMA was able to function as a test of stabilized music aptitude for students in Grades 5 and 6, Gordon (1982) had developed norms for IMMA only through fourth grade. IMMA was conceived as a measure of developmental music aptitude through age 9 (fourth grade). However, it had been conjectured IMMA might serve as a measure of stabilized music aptitude in students age 9 and higher (Gordon, 1989d). Participants Sampling Nonprobability sampling occurs when individuals are selected for availability, convenience, and ability to represent the characteristic being studied (Creswell, 2012, p. 145). Consequently, there is no known nonzero chance of selection: nonprobability sampling is subjective. Convenience sampling is a type of nonprobability sampling in which participants are selected for their availability and willingness to participate, and may be comprised of recruits or volunteers; as such, the sample may not be representative of the population at large (Creswell, 2012, p. 145). Such samples must be described in detail in order for the reader to conceptualize the abstract population about whom statistical inferences are made, and caution must be used when generalizing results from the sample to the population (Huck, 2012, pp. 100–102). STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 75 Nevertheless, convenience sampling is used in educational research when randomization is not feasible due to scheduling or economic restraints (Pedhazur & Schmelkin, 1991). Nonprobability convenience sampling was used to define the sample used in the current study. Intact classes were used as a means of efficiently collecting data with the least disruption to the school’s instructional schedule. The use of intact classes in this study might be less disadvantageous than in other studies, as the collected music aptitude test scores were considered by grade level, rather than by class level. IMMA was administered once or twice per academic year to intact classes of third-, fourth-, and fifth-grade students over a period of thirteen years. Scores were investigated as matched pairs (e.g., Fall scores compared to Spring scores of the same academic year; Spring scores of one year compared to Fall scores of the next year) by grade level, and Grade 3 scores from 2007–2019 were considered longitudinally. Thus, the effect of test administration of intact classes was mitigated.. An optimal sample was representative of the population in question, in order that generalizations could be drawn to the population at large. An advantage of the sampling technique used in this study was convenience: the researcher had easy access to the archived IMMA scores of the sample. The routine administration of IMMA to upper elementary students and the preservation of these IMMA scores over time created a unique data set of IMMA scores that was accessible and advantageous for this longitudinal study. Nevertheless, the disadvantages of non-probability sampling were susceptibility to potential bias and increased sampling error. The convenience samples could over- or under-represent a particular segment of the population, which would then affect the ability to generalize to similar populations: the population corresponding to the convenience sample was abstract, a hypothetical population that must be inferred from the sample’s description (Huck, 2012, pp. 101–102). STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 76 Sample Description. Scores (N = 1,650) from students in Grades 3, 4, and 5 in a small, rural, public school district in central Pennsylvania, where the researcher had been employed as the district elementary general music teacher for sixteen years, provided the data for this study. The students were predominantly White (98%), with 43.5% living in poverty (information based on free and reduced-price lunch eligibility) (National Center for Education Statistics website, n.d.). The district transiency rate for students was quite low: 93% of the 2018 graduating class and 84% of the 2019 graduating class were enrolled in the researcher’s general music classes throughout their tenure in elementary school. An average of 60.1% of students in Grades 4 and 5 had participated in school performance ensembles in the two most recent academic years. Thus, these students were considered musically select as defined in the MAP manual (Gordon, 1995, p. 137). Students could begin band instrument study and participation in chorus in fourth grade. Few students engaged in private instrumental lessons, largely due to financial constraints and lack of availability of local teachers. Therefore, opportunities such as participation in county band and chorus were made available to students through the sponsorship of district music teachers. Scores from bi-annual administrations of music aptitude tests (PMMA for Grades 1 and 2, IMMA for Grades 3–5) were used routinely in the sample school to differentiate instruction, and scores were monitored to track individual students’ music aptitude development. IMMA was administered intermittently once or twice per year to students in Grades 4 and 5; thus, grade level sample sizes differed: Grade 3 (N = 1,035), Grade 4 (N = 389), and Grade 5 (N = 226). Scores from all previous administrations of music aptitude tests for students in this school district had STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 77 been preserved. Consequently, it was possible to examine historical third grade IMMA scores from more than a decade of test administrations, as well as to compare those scores with IMMA scores from fourth- and fifth-grade students, when available. The stability of the overall sample population, researcher’s long tenure in the school district, routine administration of music aptitude tests, and preservation of past music aptitude scores created a unique longitudinal data set of IMMA scores for use in this study. Nevertheless, the unequal numbers of grade level test scores resulting from differences in IMMA administration by grade level from year to year might have adversely affected the assumptions of the statistical procedures. Missing Values Missing data are prevalent in behavioral research (Leech et al., 2015, p. 292), yet standard statistical methods typically presume complete information for all variables (SoleyBori, 2013). Difficulty in generalizing to a population or even misrepresentation of the population may arise as a result of missing data (Leech et al., 2015, p. 292). Landerman et al. (1997) warned that the ability of the complete case sample to accurately represent the sample or target population might be affected by the percentage of sample cases with missing data on one or more variables. The missingness mechanism which describes the relationship between observed and missing data must be identified in order to best address how to handle missing data (Cook, 2020). Missingness may be categorized as MCAR (missing completely at random: no relationship between missingness and values), MAR (missing at random: a relationship exists between missingness and observed data, but not between missingness and missing values), or MNAR (missing not at random: a relationship between missingness and missing values is undetectable) (Cook, 2020). Although MCAR is rarely true when large amounts of data are missing (Leech et al., 2015, p. 292), it is possible to test for MCAR despite its composition of STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 78 unobserved values. A significant result on Little’s (1988b) MCAR test may indicate a violation of the MCAR assumption, but does not imply MAR or NMAR status (van Ginkel et al., 2020). One is not able to test for missingness in cases of MAR, as systematic difference in observed and unobserved data cannot be compared when the values of the missing data are unknown (SoleyBori, 2013). With MAR data, one may impute the missing values of one variable from other variables (Leech et al., 2015, p. 293). MNAR also cannot be determined inferentially, as additional information about the population is needed to verify unobserved data (van Ginkel et al., 2020). Determination of ignorability, in which missing data and the parameters of interest are unrelated, or nonignorability, noted for the need to model missing data to accurately estimate parameters in the model (Cook, 2020), may aid in identification of the missingness mechanism at play. In addition, distinguishing between two main patterns of missingness may help determine the appropriate method for mitigation of missing data (Soley-Bori, 2013). There is no consensus on the amount of missing data that does or does not require mitigation. Rather, the need to consider proportion of missing data in light of the unique context of their data set supersedes sole reliance on critical values from other studies (Cook, 2020). The simplest way to address missing data is through listwise deletion, the default option in IBM’s Statistical Package for the Social Sciences (SPSS) software (IBM Corp., 2019b), which excludes all cases with missing data (van Ginkel et al., 2020). MCAR is the same assumption that underlies a complete-case analysis: when missing responses are discarded, a lack of power and potential bias results (Schafer, 1999). Disadvantages of listwise deletion are the loss of power from the smaller sample size and larger standard errors resulting from the discarding of cases with missing data. Valid inferences can only be drawn from data sets that include discarded cases if the discarded cases are representative of the entire data set. However, estimates may be STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 79 biased when discarded cases differ systematically from the rest (Schafer, 1999). Acock (2005) noted loss of 20–50% of data is typical with use of listwise deletion; McKnight et al. (2007), estimated the potential loss of data at roughly 60% (p. 100). Pairwise deletion excludes fewer cases and is thus less wasteful than listwise deletion, as only data with a missing value are excluded from calculations involving the variable for which there is no score (Field, 2009, p. 177). However, this results in an inconsistent set of participants; therefore, “the covariances do not have the constraints they would have if all covariances were based on the same set of participants” (Acock, 2005), and any conclusions drawn might differ by subsample (Cook, 2020). Single imputation procedures replace missing values with a single constant value or predicted value, which addresses the issue of wastefulness because missing cases are retained (van Ginkel et al., 2020), but may introduce additional bias: imputed values contain no error, as they are completely determined by a model applied to the observed data (Soley-Bori, 2013). Acock (2005) noted the tendency of single imputation to underestimate standard errors and overestimate the level of precision, resulting in a perception of power that cannot be justified by the data. In mean substitution, a type of single imputation procedure, missing values are replaced with the mean score of all available values (Cook, 2020). The use of a constant replacement value should not be used to impute MAR or MNAR, as it can be ineffective in accounting for extreme values, yielding a loss of variance (Cook, 2020). Expectation-maximization (EM), another single imputation approach, is an iterative procedure that uses a model to predict the missing values from observed values (Cook, 2020). Although EM has been found to produce accurate results in large data sets and in cases where missing data are ignorable, the procedure can yield overestimated standard errors, which increases the probability of Type I error (Cook, STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 80 2019). Reliable estimates have been produced using EM with MCAR cases which include up to 50% of missingness (Cook, 2020). Finally, in hot deck imputation, missing values are imputed from a donor value of an observed case similar to the missing case (Kleinke, 2018). However, the missing value may remain missing if no similar value is found in the dataset (Kleinke, 2018). The hot deck procedure has been found effective when up to 20% of data is MCAR or MAR and up to 10% if MNAR (Kleinke, 2018). Cook (2020) identified multiple imputation (MI) procedures as the methodological standard for handling missing data in the social sciences. MI generates plausible values of observed variables in the data set (Li et al., 2015). This procedure is repeated multiple times, creating many data sets that include imputed values. Each data set is slightly different from the others; therefore, it is possible to use the same method and data, yet result in different values (Soley-Bori, 2013). The use of multiple plausible values quantifies the uncertainty of estimating missing values and avoids the false precision that may result from single imputation (Li et al., 2015). These data sets are each analyzed using standard statistical methods, the results of which are pooled and a single overall inference drawn. Imputations should provide reasonable predictions for the missing data, yet the variability should reflect a degree of uncertainty (Schafer, 1999), as estimated by the standard error. Linear regression and predictive mean matching (PMM) are two MI procedures available using SPSS (IBM Corp., 2019a). Missing values are estimated as random draws from a conditional distribution in the linear regression approach (van Ginkel et al., 2020). Sufficiently accurate results are reported regardless of missingness mechanism using the regression approach. However, this procedure can generate missing values outside the range of observed values and non-normally distributed data can skew results (Cook, 2020). Nevertheless, the regression STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 81 procedure has been found suitable when more than 5% and less than 40% of ignorable data is missing (Cook, 2020). A linear regression model and monotone structure are included in the PMM model (Horton & Lipsitz, 2001); values are predicted from observed values most similar to the missing values. For each case with missing data, a set of cases (the donor pool, typically identified as k) with observed values similar to those of the predicted value is identified. From among those similar cases, one is randomly chosen and its observed value used to substitute for the missing value (Allison, 2015). Because imputed data are based on observed data, values remain within the range of possibility (Little, 1988a). The PMM approach is more immune to the effect of nonnormal data (Cook, 2020) and misspecification (Schenker & Taylor, 1996). PMM is most appropriate in cases with more than 5% and less than 40% of ignorable data (Cook, 2020). If few suitable donor cases are available, performance of PMM might be poor (Kleinke, 2018); however, Kleinke (2018) noted an increase in sample size might increase the accuracy of statistical inferences. There is no mathematical theory to justify PMM; instead, Monte Carlo simulations are relied upon (Allison, 2015). Nevertheless, based on reported results of extant studies, it is generally accepted PMM is a potentially useful method (Allison, 2015). The SPSS default setting for only one matched case results in no random draw, thereby underestimating standard errors. Therefore, it was strongly recommended this default setting should be overridden (Allison, 2015). Regardless of the missingness mechanism, both MI procedures are appropriate for imputing missing data (Cook, 2020). MI yields more power than listwise deletion, corrects for bias under MAR, and partly corrects for bias under MCAR when carried out correctly (van Ginkel et al., 2020). STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 82 The number of imputations needed is dependent on the fraction of missing information (γ), used by Rubin (1987) to define “the relative efficiency (RE) of multiple imputation as RE = (1 + γ/m)−1/2, where m is the number of imputations” (Pan & Wei, 2016); a conclusion was drawn that a small m (≤5) would be sufficient. Acock (2005) noted statistical software can generate quick estimates of 10–20 imputations, which should be more than sufficient (Acock, 2005). Soley-Bori (2013) suggested m = 20 was adequate. Nevertheless, Kleinke (2018) warned a toolarge donor pool might result in selection of inadequate donors, implausible imputations, and biased inferences. However, selection of a too-small donor pool might cause repeated selection of the same donor, resulting in deceptively increased correlations of m imputations and underestimated standard errors. Based on their findings, Graham et al. (2007) recommended researchers use more imputations than previously proposed and to consider both γ and the amount of power falloff resulting from an inadequately-sized m. Preliminary research by Young and Johnson (2015) has been conducted on the similarities and difference in use of multiple imputation for longitudinal data sets. Young and Johnson (2015) described an advantage of longitudinal data use in providing stronger inferences about change processes, but cautioned longitudinal studies may include a large amount of missing data. In panel data, the most common type of longitudinal data, variables are measured repeatedly, with measurements taken at the same times for all subjects (Allison, 2009). Missing values resulting from nonresponse to test items are categorized as “within-wave” and are addressed typically through the deletion and imputation procedures previously described. In contrast, Young and Johnson (2015) noted when respondents do not participate at all data collection time points, entire waves of data are missing and information on time-varying change is lost. Nevertheless, data from prior waves can be modeled to account for attrition. Logistic STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 83 regression can estimate the degree to which variables in previous waves predict attrition from subsequent ones. Statistical results can be used to infer the missingness mechanisms of the data. Long or stacked data structures are typical, organizing each individual’s records as waves. The key feature of longitudinal data, a large amount of missing data due to whole-wave missingness, does not seem to describe the data set of this study, as students were administered IMMA routinely and could not withdraw from testing at will. In addition, only two waves of data at a time are examined for this study’s purposes. Allison (2009) noted multiple imputation can readily handle missing panel data. For these reasons, it was concluded special treatment of this study’s data set was unnecessary. Zhang (2016) advocated for transparency in reporting how missing values are handled. Of the 1,650 students in this study, 27 students were absent on all test administration dates within that academic year and no observed data were available for examination. In addition, Grade 3 observed scores were not available from 44 students who were not enrolled in the school district until Grade 4 or Grade 5. Therefore, the data missing from these students, as well as missing values from students who were absent for specific test administrations, were imputed using predictive mean matching. As is typical in research studies, this study’s data set contained numerous cases with missing values. A description of the procedure used to manage missing data in the current study is described below: 1. Conduct a pattern analysis to determine if the patterns of missing data are consistent with MAR, monotonic or nonmonotonic, and warrant imputation (Leech et al., 2015, p. 294). 2. Impute missing data values using predictive mean matching with 10 imputations. 3. Conduct statistical tests using new data sets with imputed values. 4. Pool all data sets and generate an overall estimate. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 84 Instrument IMMA The Intermediate Measures of Music Audiation (IMMA) (Gordon, 1982; updated 1986c) was designed to test developmental music aptitude in students in Grades 1–4; norms for students in Grades 5 and 6 were added subsequently. Test items on IMMA were selected from tonal and rhythm patterns deemed “difficult” in Gordon’s (1976) taxonomy of tonal and rhythm patterns. In contrast, PMMA test items made use of the “easy” patterns from Gordon’s taxonomy. Thus, IMMA was considered a more advanced version of PMMA. Directions for the tonal and rhythm subtests were recorded, as were the test prompts (synthesizer-produced tonal patterns without rhythm and rhythm patterns without pitch). Students aurally compared two tonal patterns or two rhythm patterns, determined their sameness or difference, and circled the appropriate box on the answer sheet. The ability to read words, numbers, or music notation was unnecessary, as object identifiers were used to label test items. Percentile ranks for students in Grades 1–6 were provided for tonal scores, rhythm scores, and composite (tonal plus rhythm) scores. Test reliability and validity were estimated for group administration of IMMA to students in first grade through sixth grade. Gordon reported composite split-halves reliability and testretest reliability coefficients between .76 and .91; the reliability coefficients reported for Grade 4, when students presumably were transitioning to stabilized music aptitude, ranged from .76 to .90. Content, concurrent, congruent, and longitudinal predictive validities were estimated for IMMA and described at length in the IMMA test manual (Gordon, 1986c). Content validity, a type of subjective validity (Gordon, 1989b) and an expression of how accurately the test content measured what it was intended to measure, was established for IMMA through an examination of test item difficulty and discrimination indices (Gordon, 1986c, pp. 98–99). Criterion validity, STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 85 a type of objective validity (Gordon, 1989b), was estimated most commonly through a correlation of test scores with teacher ratings. Gordon conducted two such studies: the first study (1984a) yielded a correlation coefficient of .36 for Grade 4 composite scores (IMMA scores correlated with general music teacher’s ratings) and the second study estimated a correlation coefficient of .81 for fourth grade students who participated in band. Geissel (1985) found the validity coefficients of IMMA composite scores and MAP scores to be quite similar (.47 and .50, respectively). Congruent validity of IMMA and PMMA was inferred because the correlation estimated between the two tests was high (.74 for Grade 4) and the validity of PMMA had previously been estimated as acceptably high (.73) (Geissel, 1985). The strength of the relationships between groups of test scores was interpreted from correlation coefficients: weak (.20–.35), moderate (0.35–.65), strong (0.66–.85), or very strong (.86 and above) (Creswell, 2012, p. 347). Procedure This quantitative study investigated the relationship of music aptitude scores of a single music aptitude measure in a stable student population over the course of thirteen years. Permission was granted from the University at Buffalo’s Institutional Review Board (IRB) to conduct a study using a sample that included historical student scores preserved from tests administered prior to the study; permission was granted by the school district hosting the study (the researcher’s employer). Students in Grades 3–5 were administered IMMA routinely as a data-gathering measure for district-mandated music education. However, IMMA administration was inconsistent: although third grade students were administered IMMA each Fall, a second administration in the Spring was not always possible. In addition, IMMA scores for students in Grades 4 and 5 were not collected routinely until the 2017–2018 academic year. The outbreak of STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 86 COVID-19 in Spring 2020 caused schools in Pennsylvania to cease in-person instruction and move to remote, virtual instruction beginning March 16, 2020; this action prevented a Spring administration of IMMA for all upper elementary students in the sample for that school year. A power analysis was conducted to identify an appropriate sample size for group comparisons (Creswell, 2012, p. 611). By applying Lipsey’s (1990) Sample Size Table, as cited in Creswell (p. 611), it was estimated a minimum sample size needed to attain a .80 criterion level of power for a medium effect size of .6 at p = .05 (two-tailed) was 45 students. The sample size of each grade level in the current study (Grade 3: N = 1,035; Grade 4: N = 389; Grade 5: N = 226) exceeded this minimum; therefore, there was greater power to detect meaningful differences in mean scores. Student names and archived IMMA scores were entered into a Microsoft Excel spreadsheet and cross-checked for accuracy. Student names were moved to a separate Excel spreadsheet and assigned a randomly-generated number. These random numbers were added to the original data spreadsheet. Thus, the data were de-identified for use in analysis, yet a record of names and assigned number labels was maintained separately as a contingency. Identifying numerical labels and data were copied and pasted from Excel into SPSS to minimize potential errors in data entry. Electronic data was stored securely in a password-protected online file; student answer sheets were stored in a locked box in a locked closet in the researcher’s music classroom. Thus, confidentiality of student data was ensured. Statistical calculations in this study were performed using SPSS software (Version 26) (IBM Corp., 2019b). SPSS, designed for the analysis of social sciences data, was selected due to the researcher’s familiarity with the software and the capacity of the premium package to analyze descriptive statistics and advanced statistics needed for this study. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 87 Research Question 1 Paired t-Tests (Spring–Fall). To address Research Question 1, which framed an examination of the chronological age at which stabilized music aptitude begins, paired samples t-tests were used to test whether the means of two groups were different (Field, 2009, p. 324). Of the two types of t-tests, independent samples and dependent samples, the latter was most appropriate for this study, as music education researchers use paired t-tests to examine mean differences of the same group over a period of time (Russell, 2018, p. 67). Specifically, paired samples t-tests were used to determine if there was a significant difference in Spring IMMA tonal, rhythm, and composite scores of one academic year and corresponding Fall test scores of the following academic year. Assumptions of this parametric test were satisfied: the dependent variable, IMMA scores, were continuous and measured at the interval level. Observations were independent of others and matched for each individual. An examination of histograms was used to check for outliers and determine normal distribution. As chronological age was continuous and formal music instruction was paused between the Spring of one academic year to the Fall of the following year, the comparison of these score pairs was intended to highlight the effect of chronological age on music aptitude while controlling for instruction. Research Question 2 Wilcoxon Signed Rank Tests or Paired t-Tests by Grade Level. To consider the general effect of instruction on music aptitude, a series of Wilcoxon Signed Rank tests was used to examine scores of Fall and Spring test administrations from the same academic year. Although the examination of score difference in matched pairs was typically estimated using a paired samples t-test, when the assumption of normality was violated, STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 88 as indicated by statistically significant Shapiro-Wilk test results, the Wilcoxon Signed Rank test, a nonparametric equivalent of the paired samples t-test which used signed ranks to test the difference in observations, was conducted. Thus, scores from the Fall IMMA administration of one academic year and the Spring IMMA administration of the same academic year functioned as a pre-test and post-test, respectively. The assumptions of a paired t-test (use of a random sample, independence of observations, normal distribution, and satisfaction of the equal variance assumption) were satisfied prior to interpretation of findings (Huck, 2012, pp. 225–229) or mitigated through use of the nonparametric Wilcoxon Signed Rank test. Huck (2012) expressed caution that a Type I error, an inflated chance for a false positive that causes an erroneous rejection of the null hypothesis, could result when several null hypotheses corresponding to different dependent variables are tested simultaneously (p. 221). A common strategy to adjust for a potential Type I error is the Bonferroni adjustment procedure, as the modified level of significance is more rigorous, making it more difficult for a researcher to reject the null hypothesis (Huck, 2012, p. 221). The conventional .05 alpha level was divided by 2 (the number of analyses); therefore, the Bonferroni correction applied to each set of paired ttests was as follows: 𝛼altered = .05/2 = .025 (Huck, 2012, p. 221). The strength and direction of correlations were considered, and Cohen’s d was calculated as t/√N (Russell, 2018, p. 76) to estimate effect size of significant paired t-test results to consider practical significance. Effect sizes of less than .20 were considered small, .50 was the threshold for medium effect sizes, and .80 for large effects (Russell, 2018, p. 90). Effect size for Wilcoxon Signed Rank tests (r) was calculated as Z/√n, where N was the number of observations, and interpreted as small (.10), medium (.30), large (.50), or much larger than typical (.70) (Field, 2009, p. 558). Paired t-tests or Wilcoxon Signed Rank tests were conducted using data from all academic years, as presented in STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION Table 1. A statistically significant difference in Fall and Spring IMMA scores from the same academic year was interpreted as an indication that IMMA scores continued to fluctuate, reflecting an effect of instruction on music aptitude. Table 1 Statistical Tests by Grade Level and Academic Year Research Question 3 Repeated Measures ANOVA. The possibility of a stage of transition linking the developmental and stabilized music aptitude stages was investigated through the use of a series of repeated measures ANOVA, a 89 STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 90 univariate design. This statistical test was selected because of a desire to examine score difference of individuals for three grade levels. Assumptions of one-way repeated measures ANOVA were satisfied prior to data analysis (Field, 2009, p. 150). Continuous scale IMMA scores (tonal, rhythm, or composite) served as the dependent variable and grade level as the independent categorial variable. The assumption of normal distribution was satisfied through examination of histograms. Sphericity was defined by Field (2009) as “the equality of variances of the differences between treatment levels” (p. 459); however, one-way repeated measures ANOVA is not robust to violations of the sphericity assumption (Huck, 2012, p. 321). Therefore, when the Mauchly sphericity test yielded a statistically significant result, the Greenhouse-Geisser correction was applied to produce a valid F-ratio (Field, 2009, p. 260) and thus account for the lack of sphericity. The Huynh-Feldt correction was also reported, as the Greenhouse-Geisser correction can be too conservative (Field, 2009, p. 466). Four multivariate test statistics were produced by SPSS in a repeated measures ANOVA; results of the four test statistics likely were different if there was more than one underlying variate, and there was no consensus on which test statistic was preferable (Hatcher, 2013, p. 352). Wilks’s lambda (λ), the most widely reported multivariate statistic (Hatcher, 2013, p. 352), “is a measure of the percent of variance in the dependent variables that is not explained by differences in the independent variable” (Russell, 2018, p. 131), and should be reported in addition to its corresponding F statistic. From Wilks’ lambda, η2, an index of effect size, may be interpreted (.01 as a small effect, .06 as a medium effect, and .14 as a large effect) (Hatcher, 2013, p. 352). However, Hatcher (2013) recommended interpretation of Pillai’s Trace as slightly more powerful than the other reported statistics when more than one variate is evident, and Field (2009) asserted Pillai’s Trace was the most robust to violations of multivariate normality (p. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 91 605). Thus, it was concluded interpretation of Pillai’s Trace was suitable as an estimation of significant difference in the repeated measures ANOVA of the current study. Partial eta-squared was reported as the effect size of significant Pillai’s Trace results. Data collection included matched IMMA scores of individual students in Grades 3, 4, and 5, as presented in Table 2. Table 2 Three-Year Longitudinal Examination of IMMA Scores As an omnibus F-test, ANOVA tests for all differences in a set of means (Evans, 1996, p. 339). A resulting significant F statistic provided evidence that all group means were not equal, and must be succeeded by computing a post hoc statistic to identify which pairs of means significantly affected the group difference. The Bonferroni post hoc test was selected to guard against a high familywise rate of Type I errors (Evans, 1996, p. 363), or false positive results, and was more robust to violations of sphericity than post hoc tests such as Tukey’s honestly significant difference (HSD). Thus, groups were compared in pairs through use of pairwise comparisons (Huck, 2012, p. 260), with the level of significance adjusted to mitigate the risk of Type I error (Huck, 2012, p. 262). STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 92 The effect size was considered in order to address the interpretation of practical significance for significant ANOVA findings. Huck (2012) asserted partial eta-squared (ηp2) provided an index of the proportion of variability explained by the independent variable (pp. 222–223). However, Field (2009) noted a preference for use of omega squared as the best measure of the overall effect size for repeated measures ANOVA, as he believed it more useful to report effect sizes for focused comparisons rather than the main ANOVA (pp. 479–481). Hatcher (2013) cautioned eta-squared, and by extension partial eta-squared, tended to overestimate the effect size; however, omega squared provided an unbiased estimate of the variance (p. 370). Thus, omega squared (2) was selected as the post hoc statistical measure of effect size for this study. The equation used to calculate omega squared was 2 = SSB – (k – 1)MSw SST + MSw where SSB was the Sum of Squares Between, k – 1 the degrees of freedom of Sum of Squares Between, MSw the Mean Square Within, and SST the Sum of Squares Total. Microsoft Excel was used to conduct these calculations. The value of omega squared is 1; the value is negative if the observed F is less than one. The effect size criteria used for omega squared were .01 (small), .06 (medium), and .14 (large) (Hatcher, 2013, p. 370); these were the same criteria for interpretation of eta-squared and partial eta-squared (Hatcher, 2013, p. 370). Examination of the effect size aided in interpretation of practical significance: findings that were negligibly above or below the desired p < .05 level might suggest mean score differences that were only nominally divergent, implying a period of transition in music aptitude stages. Significant difference of scores of separate grade levels was interpreted as students functioning at dissimilar stages of music aptitude. For example, if the findings indicated Grade 3 STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 93 IMMA tonal scores were significantly different than Grade 5 IMMA tonal scores, it was likely students in Grade 3 were still in the developmental music aptitude stage, while Grade 5 students might have transitioned to the stabilized music aptitude stage. In contrast, stability of scores between grade levels might indicate students had already achieved the stabilized music aptitude stage, defined as resistant to instruction and therefore immutable to significant score changes. A graphic representation of the research procedure is depicted as Figure 2. Figure 2 Research Procedure Summary The research methodology for this study was detailed in this chapter. The sample, nonprobability convenience sampling method, measure of stabilized music aptitude, data analysis techniques (paired t-tests, Wilcoxon Signed Rank tests, and repeated measures ANOVA), and quantitative design were described in detail for transparency and the possibility of STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 94 future replication. An examination of the relationships explored in this study may shed light on the onset of, transition to, and constancy of stabilized music aptitude in students of intermediate grade levels. Chapter 4 Presentation and Interpretation of Data Introduction The purpose of this study was to investigate the onset of, transition to, and longitudinal constancy of stabilized music aptitude in upper elementary students. To achieve this purpose, the following questions guided the research: 1. At what grade level does chronological age cease to affect student music aptitude? 2. At what grade level does instruction cease to affect student music aptitude? 3. Is there evidence to substantiate the transition between the developmental music aptitude stage and stabilized music aptitude stage at approximately age 9/Grade 4? IMMA scores were collected from a large sample of students (N = 1,650) in Grades 3, 4, and 5 over a thirteen-year period from 2007 to 2019. McKnight et al. (2007) noted the prevalence of missing data in research studies (p. 1) and observed that missing data were more likely to result from repeated observations than from a single observation (p. 54). As expected, numerous values were missing in the data set of the current study. Pattern analysis of data “provides descriptions of patterns of missing and can be a useful exploratory step before imputation” (IBM Corp., 2019a), as well as in selection of the appropriate missing data procedure. Therefore, pattern analysis was conducted on the observed data of the current study using SPSS (McKnight et al., 2007, p. 122). The results, generated as percentage of variables, cases, and data points including missing data, were scrutinized and the pattern of missing and nonmissing values examined. In STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 95 addition, the result of Little’s MCAR test (1988b), a single global statistic expressed as 2 and used to determine whether data are missing completely at random, was considered. Pattern Analysis of Missing Data As is apparent in the first two images of Figure 3, all measured variables (all combinations of grade level and Fall/Spring test administrations considered in the current study) and all cases (all students in the current study’s sample) were missing at least one value within the unimputed data set of tonal, rhythm, and composite scores. Approximately 65% of all the values (individual scores) within the unimputed data set of observed scores were missing, as represented in the third image of Figure 3. Figure 3 Overall Summary of Missing Values (Unimputed Grade 3–5 Data Set, All Scores) Analysis variables sorted by percent of missing data in decreasing order (IBM Corp., 2019a) are displayed as a Variable Summary table (see Table 3). Missing data on the eighteen variables (Fall and Spring administrations of tonal, rhythm, and composite scores for Grades 3– STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 96 5) ranged from 10–94% of the sample. The percentage of missing values was extensive, particularly in the Spring administrations of Grades 4 and 5, as IMMA was not administered routinely in those grades and often in Fall only. This proportion of missing data was concerning, and care would need to be taken to mitigate the effects of such a high level of missingness. Table 3 Variable Summary (Unimputed Grade 3–5 Data Set, All Scores) One hundred three patterns of missing data were found for the eighteen variables and are exhibited in Figure 4. A group of cases with a similar pattern of missing and nonmissing values is represented in each row (IBM Corp., 2019a). Patterns of missingness are interpreted horizontally; red bars represent missing data found for each variable. The variables are arranged horizontally on the x-axis in increasing order of missing values in order to approximate a monotonic pattern. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 97 Figure 4 Missing Value Patterns (Unimputed Grade 3–5 Data Set, All Scores) Monotonicity describes a pattern of missingness in which missing data are “dependent or conditional on missing data for other items or groups of items” (McKnight et al., 2007, p. 62): missing and nonmissing data will appear contiguous in the Missing Value Pattern figure if the data are monotonic. As expected, the pattern of missingness approximated nonmonotonicity for Grade 3 Fall and Spring variables: the red bars representing missing data and those representing nonmissing data are not contiguous for Grade 3 variables. The pattern of missingness was less haphazard for Grade 4 and Grade 5 Fall variables, and from the contiguous concentration of red bars in the lower right corner, it was concluded the patterns of missingness for Grade 4 Spring subtest scores were monotonic and would require imputation of scores to mitigate systematic bias of that portion of the sample. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 98 The 10 most frequent patterns of missing values, illustrated in the bar graph in Figure 5, offered another perspective. The most common pattern, Pattern 87, represents a large proportion of missing data (approximately 40%); missing values for all Grade 4 and Grade 5 Fall and Spring test administrations comprised the horizontal Pattern 87 in Figure 4. Similarly, it was indicated by Pattern 100 that approximately 20% of cases were missing values on most test administrations, from Grade 3 Spring through Grade 5 Spring. Figure 5 Missing Value Patterns Bar Graph (Unimputed Grade 3–5 Data Set, All Scores) From the missing values analysis, conducted to examine patterns of missing data for identification of the missingness mechanism as missing at random (MAR), not missing at random (NMAR), or missing completely at random (MCAR), and selection of the appropriate statistical technique to handle the missing values, two patterns contained a markedly higher STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 99 percentage of missing values than others. Approximately 40% of the cases (all Grade 4 and Grade 5 variables) were missing data, as modeled in Pattern 87; Pattern 100 reflected an additional 20% of variables (from Grade 3 Spring tonal through Grade 5 Spring composite variables) containing missing data. From these results, it was determined missing values were far more prevalent in Grades 4 and 5, particularly for Spring IMMA administrations of all subtests, and the missing value patterns appeared split between the nonmonotonicity of Grade 3 patterns and the monotonicity of Grade 4 and Grade 5 patterns. The percentage of incomplete data in the observed data set was quite large, as was prevalent in longitudinal studies. McKnight et al. (2007) observed studies using longitudinal data often resulted in monotonic missing data patterns. Although these patterns were somewhat predictable, they also generated large proportions of missing data (p.106). Consequently, neither listwise deletion, in which only complete cases were included, nor pairwise deletion, in which only cases having nonmissing values for both variables within a given pair of variables were included, were recommended for the current study. Either would result in a decrease in power and could potentially introduce bias, as it was unknown whether the loss of those specific cases might lead to misrepresentation of the population, as cautioned by Landerman et al. (1997). The evidence reinforced the conclusion that imputation of missing values was prudent. Data missing because of specific factors might bias the results (Leech et al., 2015, pp. 292–299); therefore, classification of the data as MAR, NMAR, or MCAR was necessary and reliant on consideration of missingness characteristics. To determine if the missingness could be categorized as MCAR (missing completely at random), Little’s MCAR test was conducted using Grade 3, Grade 4, and Grade 5 IMMA tonal scores, rhythm scores, and composite scores. The results of Little’s MCAR test, X2 (414, N = 131) = 397.219, p > .05, were not significant, an STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 100 indication the data were missing randomly, with no relationship between missing data and observed data (McKnight et al., 2007, p. 46). MCAR data is ignorable and does not need to be modeled for the parameter estimation process (McKnight et al., 2005, p. 51); however, even with a large proportion of missing data, the observed data might generate unbiased parameter estimates when the mechanism is MCAR (McKnight et al., 2007, p. 61). McKnight et al. (2007) noted there did not appear to be a general agreement on classification of the amount of missing data as small, medium, or large (p. 61); nevertheless, Madley-Dowd et al. (2019) suggested even with large proportions of missing data (up to 90%), use of a properly specified imputation model and MAR data can yield unbiased results. Jin and Huber (2011) concurred, concluding multiple imputation (MI) may produce unbiased estimates of large numbers of missing data classified as MCAR or MAR and adjudging MI as preferable to complete case analysis under all missingness mechanisms. Due to the mixed results of the pattern analysis, in which the pattern of monotonicity was split by grade levels, and Little’s MCAR test, in which the missing data were classified as MCAR, it was concluded all observed cases would be included and missing values imputed using predictive mean matching with 10 imputations as the preferred method for handling a large percentage of missing values. Imputation of Missing Values In predictive mean matching, a donor pool with observed values similar to those of the. predicted value is identified. One value is selected randomly, and its observed value substituted for the missing value (Schenker & Taylor, 1996). Ten imputations, conducted using SPSS, resulted in a pooled data set that was then used in all subsequent statistical analyses (Leech et al, 2015, p. 306). The pooled data set results of each statistical analysis are presented when available. Due to the nature of multiple imputation, slight differences may be expected for each STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 101 instance of imputing values (Van Ginkel et al., 2020). Standard deviations are not pooled automatically in SPSS; therefore, pooled standard deviations were calculated using Microsoft Excel (Heymans & Eekhout, 2019). Overview of Statistical Analyses To address Research Question 1, a series of paired t-tests was conducted using paired samples from Spring IMMA scores of one academic year and corresponding Fall IMMA scores of the subsequent academic year to examine the effect of chronological age on music aptitude while controlling for instruction. Research Question 2 was addressed by conducting a series of Wilcoxon Signed Rank tests or paired t-tests to examine scores of Fall and Spring administrations from the same grade level, thus considering the effect of instruction on music aptitude by academic year. A one-way repeated measures ANOVA was conducted to examine IMMA scores longitudinally over a 3-year period to consider if mean differences in scores might suggest a period of transition between developmental and stabilized music aptitude stages, as queried in Research Question 3. For paired t-tests, Cohen’s d, a standardized measure of mean difference calculated as t/√N (Russell, 2018, p. 76), was used to estimate effect size of statistically significant results in order to consider practical significance. According to Russell (2018), small effect sizes are .20– .30, as approximately 4–9% of the total variance is explained, a medium effect size (d = .50) estimates approximately 25% of the total variance, and d = .80 is the threshold for a large effect, in which approximately 64% of the total variance is estimated (p. 90). Cohen (1988) further characterized a small effect as “difficult to detect”, a medium effect as “large enough to be visible to the naked eye”, and a large effect as a “grossly perceptible” difference (Hatcher, 2013, pp. 163–164). For Wilcoxon Signed Rank tests, r was calculated as Z/√N, where N is the number STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 102 of observations (Field, 2009, p. 558), and interpreted as small (.10), medium (.30), large (.50), or much larger than typical (.70) (Leech et al., 2015, p. 95). For repeated measures ANOVA, omega squared was used to estimate the amount of variance of the dependent variable that was explained by the independent variable (Field, 2009, p. 479): .01 was considered a weak effect, .06 a moderate effect, and .14 a strong effect (Hatcher, 2013, p. 370). The a priori alpha level was set at p < .05 for all statistical tests (Russell, 2018, p. 24). Results of the paired t-test, Wilcoxon Signed Rank test, and repeated measures ANOVA analyses are presented and discussed in this chapter and related to the research questions posited in this study. Thus, the effect of chronological age and instruction on music aptitude at a given grade level was considered, and the feasibility of a period of transition between the developmental and stabilized music aptitude stages discussed. Research Question 1 At what grade level does chronological age cease to affect student music aptitude? To investigate the effect of chronological age on music aptitude, a series of paired t-tests was used to determine if the mean difference of Spring scores of one academic year and corresponding Fall scores of the following academic year, which allowed for increased chronological age during the summer months without the effect of school-related music instruction, was statistically significant. IMMA was administered in Grade 4 in the following semesters only: Fall 2011, Fall 2015, Fall 2017, Fall 2018, Spring 2019, and Fall 2019. Of the 1,035 students for whom Grade 3 IMMA scores were available, 238 also had Grade 4 scores available. Scores from the same individuals on two measures were desired to conduct a paired ttest; therefore, scores of 795 cases could not be matched to Grade 3 scores of the same individuals, as IMMA had not been administered to those students in Grade 4. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 103 Although a rationale for imputation of missing scores had been established previously, imputation of such a large proportion of missing scores (approximately 77% of Grade 4 scores) required additional consideration. Imputation had been deemed valid for cases in which scores were missing arbitrarily due to student absence on the day of test administration, and the random nature of the missingness was speculated. Missing data were anticipated, as the repeated measures used in the study design were often a source of incidental missing data (McKnight et al., 2007, pp. 54–55). However, it seemed a different matter to impute test scores of an entire test battery (tonal, rhythm, and composite scores) for each student to whom the test had not been administered in a particular semester, and exclusion of cases could reduce the power needed to reject the null hypothesis when it was false. However, an a priori power analysis conducted according to the following parameters estimated the suggested number of participants needed for each group in the sample was 45: alpha level .05, power .80, and effect size .6 (Creswell, 2012, p. 611). Thus, a sample size of 238, the number of students for whom both Grade 3 and Grade 4 IMMA scores were available, exceeded the minimum number of 45 recommended by the results of the a priori power analysis. The power analysis results seemed adequate justification to exclude the 795 cases with missing Grade 4 scores due to no test administration from the paired t-test analysis for Research Question 1. The ramifications of excluding a large number of missing cases were severe and not to be taken lightly. Consequently, paired t-test results of Grade 3 Spring/Grade 4 Fall IMMA scores for the observed case sample and the excluded case sample were compared. Missing values were imputed using predictive mean matching with 10 imputations for both samples. Because values were no longer missing due to imputation, the “observed case sample” will be referred to as the “complete case sample” from this point forward. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 104 Although the means for both the complete case sample and excluded case sample were roughly comparable in most instances, the excluded means for Grade 3 Spring tonal and Grade 3 Spring composite scores were markedly different than comparable means of the complete case sample. The mean Grade 3 Spring tonal score was the pooled result of 10 imputations using predictive mean matching; the random draw from possible donors appeared skewed toward lower scores during imputation. This contention was supported by an examination of the imputed means and standard deviations for Grade 3 Spring tonal scores: the imputed mean scores were several points lower than the original mean and the imputed standard deviations were higher than those of other variables. It was likely the mean Grade 3 Spring composite score was affected by the discrepancy in the mean Grade 3 Spring tonal score, as composite scores are comprised in part of tonal scores. In addition, the standard deviation for Grade 3 Spring composite scores in the excluded case sample (SD = 19.917) was inconsistent with that of other variables from the same sample or the complete case sample. As the standard deviation is an indication of the dispersion of scores from the mean, such a high standard deviation was suggestive of a large amount of variation in Grade 3 Spring composite scores. SPSS-generated histograms of Grade 3 Spring composite scores of both samples were examined. It was observed scores of the complete case sample more strongly resembled a normal distribution than did those of the excluded case sample. Nevertheless, expediency precluded display of all charts produced. Perhaps the larger sample size of the complete case sample (N = 1,054) allowed a more even dispersion of scores, thus masking the impact of outliers in a manner the smaller size of the excluded case sample (N = 238) could not accommodate. In a similar manner, paired t-test results of Grade 4 Spring/Grade 5 Fall IMMA scores for the complete case sample and excluded case sample were compared and missing values imputed STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 105 for both samples. Means of both samples were quite similar. However, the standard deviation of Grade 5 Fall composite scores was considerably larger for the excluded case sample than for all other standard deviations, meaning scores for that test were more widely dispersed from the mean than scores of other tests. An examination of all histograms of both imputed samples revealed a similar result to that concluded for Grade 3 Spring composite scores: a normal distribution was more closely modeled by Grade 5 Fall composite scores of the complete case sample than by those of the excluded case sample. Perhaps the effect of outliers was mitigated by the larger sample size (N = 1,069): the dispersion of the smaller number of scores from the excluded case sample (N = 132) would be more limited, allowing outliers to have greater influence on the shape of the distribution, resulting in a discrepancy in standard deviation values. Descriptive statistics for all samples are illustrated in Table 4. Table 4 Descriptive Statistics of Complete and Excluded Case Samples (Pooled) A compiled table of correlation results is presented in Table 5. It was concluded Grade 3 Spring tonal, rhythm, and composite scores were significantly correlated with corresponding STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 106 Grade 4 Fall scores level for both samples. In both samples, only composite score correlations were significant for Grade 4 Spring and Grade 5 Fall scores. The correlation coefficients of the complete case and excluded case samples were similar and quite small. These weak correlations seemed to reflect Gordon’s (2012) assertion that correlations of scores from consecutive semesters or years were weak, possibly because the influence of musical environment was stronger than that of formal instruction for students in the developmental music aptitude stage (p. 54). Accordingly, it was concluded the use of the complete case sample (the sample containing observed scores with missing values imputed) was preferable to address Research Question 1. Table 5 Correlation Results of Complete and Excluded Case Samples (Pooled) The paired samples results varied greatly between the two samples. The mean difference in Grade 3 Spring scores and Grade 4 Fall scores was larger, the scores more widely dispersed from the mean, and the distance of the sample mean from the population mean greater in the excluded case sample than for the complete case sample. It appeared the ability of the excluded case sample to accurately represent the nature of the complete case sample was flawed to such an extent as to be deemed untrustworthy. In contrast to the Grade 3 Spring/Grade 4 Fall findings, the paired samples test results for the Grade 4 Spring and Grade 5 Fall complete case and excluded case samples were more STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 107 similar. Thus, the difference in scores (mean difference), score dispersion from the mean (standard deviation), and distance of the sample mean from the population mean (standard error of the mean) were comparable for both samples. Paired samples test results for complete cases and excluded cases are displayed in Tables 6 and 7. Table 6 Paired Samples t-Test Results–Complete Case Sample (Pooled) Table 7 Paired Samples t-Test Results–Excluded Case Sample (Pooled) STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 108 A series of paired t-tests was conducted to determine if statistical results supported the rationale to exclude missing cases due to student non-enrollment at the time of test administration as described previously. The use of either the Grade 4 Spring/Grade 5 Fall complete case sample or the excluded case sample could be justified from descriptive statistics, correlation, and paired samples test results. This was in stark contrast to the findings for the Grade 3 Spring/Grade 4 Fall complete case and excluded case samples, in which the use of the complete case sample was favored strongly. There seemed no advantage to overlooking the discrepancies in paired samples results, particularly for Grade 3 Spring/Grade 4 Fall scores; thus, sole use of the excluded case sample seemed ill-advised. From these findings, it was concluded the use of the complete case sample, which included observed values for all grade levels with all missing values imputed, was more feasible and practical to address Research Question 1 of the current study. Paired Samples t-Test Results Tonal. Although Grade 4 Fall tonal scores were anticipated to exceed Grade 3 Spring tonal scores based on Gordon’s (2005) conclusion that a score increase due to chronological age was typical for tests, the mean difference in Grade 3 Spring/Grade 4 Fall tonal scores was not significant. A weak but significant correlation (r = .278, p < .001) was found for Grade 3 Spring scores and Grade 4 Fall scores (N = 1,070), as presented in Table 8: as Grade 3 Spring tonal scores increased, so did Grade 4 Fall tonal scores. Correlation coefficients were interpreted as a weak (.20–.35), moderate (0.35–.65), strong (0.66–.85), or very strong relationship (.86 and above) (Creswell, 2012, p. 347).The mean difference of Grade 4 Fall and Grade 3 Spring tonal scores was not statistically significant (t(27) = -.049, p > .05), as seen in Table 9. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 109 Table 8 3ST–4FT Descriptive Statistics and Correlation Coefficient Table 9 3ST–4FT Paired t-Test Results Scores (N = 1,070) of students who had been administered the IMMA tonal subtest in the Spring of their fourth-grade year and the Fall of their fifth-grade year were significantly correlated (r = .257, p < .001), as displayed in Table 10. As Grade 4 Spring tonal scores increased, Grade 5 Fall tonal scores also increased. The mean difference of Grade 5 Fall tonal and Grade 4 Spring tonal scores was not statistically significant (t(13) = -.423, p > .05), as illustrated in Table 11. The mean difference between scores from the Spring administration and the subsequent Fall administration of the IMMA tonal subtest for Grades 3–5 was not statistically significant for all grade levels considered; these findings are summarized and graphically represented in Figure 6. Gordon (2005) noted an increase in raw scores due to chronological age was typical for tests; STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 110 therefore as chronological age increased, scores were expected to increase from the first to the second test administration within one grade level, from one grade level to the subsequent grade level, and from the Spring administration of one grade level to the Fall administration of the following grade level, as was the case in this study. However, there was no significant difference in mean scores between grade levels, from which one could speculate students already had achieved the stabilized music aptitude stage, as score fluctuation would be expected for students still in the developmental music aptitude stage (Gordon, 1981). Similarly, a high correlation between scores of successive grade levels might be anticipated if students had achieved the stabilized music aptitude stage (Gordon, 2005), as the effect of musical environment or training would have ceased and relative standing on music aptitude tests would be maintained (Gordon, 1980b). Nonetheless, only a modest relationship between Spring scores and Fall scores of the following academic year was suggested by the correlation coefficients in this study. Table 10 4ST–5FT Descriptive Statistics and Correlation Coefficient The weak correlations were surprising: the test items for the Grade 3 Spring subtests and Grade 4 Fall subtests were identical and it was anticipated the correlation between those sets of scores would be strong. Still, Gordon (2012) noted a weak correlation between scores from one semester or year to another, even when all students received quality music instruction: “It seems students’ immediate impressions and intuitive responses to environmental influences have more STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 111 influence on developmental music aptitude than systematic formal instruction in music achievement” (p. 54). In addition, Gordon (2012) observed magnitude, rather than direction, of score changes from year to year seemed to affect lower longitudinal correlation coefficients of developmental music aptitude tests, unlike those for stabilized music aptitude test scores (p. 27). Table 11 4ST–5FT Paired t-Test Results Although this observation seemed to explain the findings of the current study, it had been asserted students after age 9 had achieved the stabilized music aptitude stage (Gordon, 2001a) and thus their music aptitude was unaffected by environmental influences. It was speculated the correlation findings in the current study supported a transition period between the developmental and stabilized music aptitude stages during the upper elementary years, in which the influence of musical environment continued to affect music aptitude or recurred in a less predictable pattern than originally described. Perhaps students correctly answered a similar number of test items in each test administration, as suggested by the negligible difference in mean scores, but the test items that were answered correctly were not identical from the Spring test administration to the subsequent Fall test administration, as suggested by the small correlation coefficients. Regardless, evidence to support an effect of chronological age on tonal music aptitude was not found from these contradictory results. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 112 Figure 6 Paired t-Test Spring–Fall Results (Tonal) Rhythm. Results of the paired t-test examination of rhythm scores were dissimilar to tonal results. As presented in Table 12, correlations of scores (N = 1,070) of students who had been administered the rhythm subtest in the Spring of their third-grade year and the Fall of their fourth-grade year were not significant (r = .095, p >.05). Grade 4 Fall rhythm scores were an average of .449 points higher than Grade 3 Spring rhythm scores, and this mean difference was statistically significant (t(31) = -2.062 p = .48), as exhibited in Table 13. Cohen’s d was estimated as d = .11, a small effect size. Similarly, scores (N = 1,070) of students who had been administered the IMMA rhythm subtest in the Spring of their fourth-grade year and the Fall of their fifth-grade year were not STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 113 significantly correlated (r = .103, p > .05), as displayed in Table 14. The mean Grade 4 Spring– Grade 5 Fall rhythm score difference was not statistically significant (t(13) = -2,092, p > .05) (see Table 15). The lack of correlation significance might have been more notable had the correlation itself been stronger; however, the weak correlation aligned with Gordon’s 2012 observation that correlations of scores of consecutive tests were low. The score decrease from Grade 4 Spring to Grade 5 Fall scores was unexpected; nevertheless, the finding was not significant. The significant increase in Grade 3 Spring–Grade 4 Fall scores coupled with the decrease in Grade 4 Spring–Grade 5 Fall scores might have been noteworthy, as the trend seemed to confirm the direction of score fluctuation described previously by Gordon (2012). However, neither score difference reached the threshold of a 2-point increase ascribed by Gordon (2002) to students who participate in traditional instruction, and the Grade 4 Spring–Grade 5 Fall score difference was not statistically significant. Therefore, no effect of chronological age for rhythm music aptitude was concluded, and practical significance was unlikely. Table 12 3SR–4FR Descriptive Statistics and Correlation Coefficient The mean difference in scores from the Spring administration to the subsequent Fall administration of the IMMA rhythm subtest was approximately one-half point and statistically significant for Grade 3 Spring–Grade 4 Fall scores only (p = .048). These findings are STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 114 summarized and graphically represented in Figure 7. A lack of notable score fluctuation, as was apparent in this sample, would be anticipated if students had attained the stabilized music aptitude stage (Gordon, 1980b). Based on these results, it was concluded chronological age had little effect on the relative standing of rhythm aptitude scores in this sample. The weakness of the correlations indicated a lack of association between rhythm scores of successive grade levels. This was consistent with Gordon’s (2012) assertion that correlations of scores on consecutive test administration were “alarmingly low”, perhaps due to the influence of students’ responses to environmental influence on developmental music aptitude (p. 54). If students had already achieved the stabilized music aptitude stage, the influence of musical environment should have waned, resulting in a lack of correlation. The discrepancy in expected and observed correlation Table 13 3SR–4FR Paired t-Test Results Table 14 4SR–5FR Descriptive Statistics and Correlation Coefficient STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION Table 15 4SR–5FR Paired t-Test Results Figure 7 Paired t-Test Spring–Fall Results (Rhythm) 115 STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 116 coefficients and minimal score difference could be explained as students correctly answering a similar number of test items, but not answering the same test items correctly. This observation is speculative and would need further investigation in future studies. Composite. As anticipated, results of the paired t-test examination of composite scores mirrored those of tonal scores and rhythm subtest scores. As exhibited in Table 16, composite scores (N = 1,070) from the Grade 3 Spring and Grade 4 Fall test administrations were significantly correlated (r = .252 p < .01): as Grade 3 Spring composite scores increased, Grade 4 Fall composite scores also increased. Grade 4 Fall composite scores were an average of .874 points higher than Grade 3 Spring composite scores. The mean difference was statistically significant (t(87) = -2.641, p = .010) (see Table 17), and Cohen’s d was estimated as d = .11, a small effect. Composite scores (N = 1,070) of students from Grade 4 Spring and Grade 5 Fall test administrations were not correlated significantly (r = .119, p > .05), as displayed in Table 18. The mean Grade 4 Spring–Grade 5 Fall composite score difference was not statistically significant (t(27) = 1.912, p > .05), as presented in Table 19. The trend of composite scores was anticipated to simulate that of tonal or rhythm scores and to reflect an increase in composite scores as chronological age increased. Therefore, the decrease in mean composite scores from Grade 4 Spring to Grade 5 Fall test administrations, though non-significant, was unexpected, as findings of extant literature also supported score increase due to chronological age. Thus, composite results resembled rhythm results of the same test administrations. Despite the finding of statistical significance for Grade 3 Spring/Grade 4 Fall composite score difference, practical significance is cautioned, as the mean score difference was less than one point. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 117 Table 16 3SC–4FC Descriptive Statistics and Correlation Coefficient Table 17 3SC–4FC Paired t-Test Results Synopsis The paired t-test composite findings of Spring scores from one academic year and Fall scores of the following year are summarized and graphically represented in Figure 8. A clear interpretation of these findings was elusive. A lack of pronounced score fluctuation would be expected if students had achieved the stabilized music aptitude stage (Gordon, 1980b); this seemed to be the case in this sample. A statistically significant correlation (r = .252, p < .01) of Grade 3 Spring/Grade 4 Fall composite scores seemed to support a determination of stabilized music aptitude: an association between composite scores of successive grade levels would seem to indicate mean score difference had dwindled as the influence of musical environment waned, as described by Gordon (1981). STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 118 Table 18 4SC–5FC Descriptive Statistics and Correlation Coefficient Table 19 4SC–5FC Paired t-Test Results Nevertheless, both relationships between Spring scores and Fall scores of the subsequent academic year were weak; a discrepancy between the number of correctly answered test items and the stability of those answers in relation to specific test items was speculated. Gordon (2012) reported median correlations of repeated stabilized music aptitude test administrations were approximately .80; corresponding correlations for developmental music aptitude test administrations only approximated .30 (p. 27). Following this logic, it seemed the weak correlations found for composite scores in the current study were suggestive of students’ continued presence in the developmental music aptitude stage. Yet mean composite score differences were small for all grade levels and could signify composite music aptitude was fixed. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 119 Student music aptitude stage could not be established conclusively from these findings, and little evidence was found to suggest an effect of chronological age on composite aptitude. Figure 8 Paired t-Test Spring–Fall Results (Composite) The relative constancy of scores over the 3-year period from Grade 3 through Grade 5 could be interpreted as evidence students had progressed to the stabilized music aptitude stage before Grade 3. The findings of previous research both support and dispute this interpretation. Degé et al. (2017) and Walters (1991) noted the influence of external factors on developmental music aptitude before age 9, yet Gordon (1980b, 1986c, 2012, 2013) was consistent in his claim that musical environment no longer affected music aptitude at approximately age 9, defined as the period of stabilized music aptitude. Phillips et al. (2002) conjectured Grade 3 or STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 120 approximately age 9 might serve as the pivotal year for development of aural skills before the stabilization of music aptitude. It should be noted for the current study’s school district, the terms “Grade 3” and “age 9” did not describe the same students, as 9-year old students were typically in fourth grade. Gordon (1998) noted a tendency for music aptitude test scores to increase with chronological age for students in the stabilized music aptitude stage and for scores and percentile ranks to fluctuate for students who take a developmental music aptitude test (p. 169). Mean score differences in the current study were small: the largest difference, found for Grade 3 Spring/Grade 4 Fall composite scores, was less than one point. Nevertheless, Grade 3 Spring/Grade 4 Fall mean score differences were significant for rhythm scores and composite scores in this sample. Correlations, although significant for tonal scores and Grade 3 Spring/Grade 4 Fall composite scores, were weak for all grade and subtest combinations. Therefore, it was concluded the chronological age at which the developmental music aptitude stage progressed to the stabilized music aptitude stage was not clarified by the results of the current study, as there was no clear decrease in score fluctuation that might indicate the age at which a shift between music aptitude stages might occur. Research Question 2 At what grade level does instruction cease to affect student music aptitude? The difference in mean scores of matched pairs by year and semester was examined to gain an in-depth view of change in subtest scores by grade level as influenced by instruction. In most instances, the assumption of normal distribution for paired t-tests was not met and the nonparametric Wilcoxon Signed Rank test was substituted. The following third grade scores were not available, as IMMA was not administered in the Spring of those academic years: STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION Grade 3 121 Spring 2009, Spring 2011, Spring 2015, and Spring 2020 IMMA was not administered routinely in Grades 4 and 5. Therefore, only the following observed scores were available: Grade 4 Fall 2011, Fall 2015, Fall 2017, Fall 2018, Spring 2019, Fall 2019 Grade 5 Fall 2017, Fall 2018, Spring 2019, Fall 2019 Fall and Spring scores from the same academic year were compared and all missing values imputed using predictive mean matching with 10 imputations (Acock, 2005). Results were categorized and reported by statistical test, year, grade level, and semester. An online effect size calculator (Stangroom, 2021) was used to calculate Cohen’s d, reported as an estimate of effect size for paired t-test results and interpreted as a small effect (.2), medium effect (.5), or large effect (.8) (Russell, 2018, p. 90). Wilcoxon Signed Rank test effect size was calculated using Microsoft Excel as r = Z/√N (Field, 2009, p. 558), where N was the total number of observations. The absolute value of r was interpreted as a small effect size (0.1), medium effect size (0.3), or large effect size (0.5) (Field, 2009, p. 558). Wilcoxon Signed Rank Test Results A Shapiro-Wilk test was conducted on each set of grade level subtest scores to estimate the assumption of normality (Field, 2009, p. 144). A significant result indicated a violation of the assumption of normality. In some instances, kurtosis was found to be leptokurtic: the steepness of the distribution curve was high, which suggested heavy tails or outliers. These results were supported by visual examination of SPSS-generated histograms and Normal Q-Q plots for each of the 10 imputations of each set of scores. When the normality assumption was violated, Wilcoxon Signed Rank tests were conducted instead of paired t-tests to compare differences in mean scores. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 122 An effect of instruction on music aptitude was not concluded from the results of Wilcoxon Signed Rank tests for Grade 3 IMMA scores by academic year. Mean score differences were generally negligible and non-significant. A relative constancy of subtest scores was suggestive of an attainment of the stabilized music aptitude stage prior to Grade 3. Wilcoxon Signed Rank test results are summarized and depicted for tonal scores (see Table 20), rhythm scores (see Table 21), and composite scores (see Table 22) of all academic years in order to present a broad perspective of longitudinal change. Table 20 Wilcoxon Signed Rank Test Results (Tonal) As is apparent from the compiled tonal results featured in Table 20, correlations of all pairs of Fall and Spring tonal scores were strong and significant.(p < .01). Only the mean differences between Fall and Spring tonal scores for two years (2011–2012 and 2013–2014) STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 123 were statistically significant (p < .05). Results of each academic year are analyzed and discussed separately later in this section in order to consider score change through a narrower lens. Correlations of most pairs of Fall/Spring rhythm scores were strong and statistically significant, as presented in Table 21. However, no mean differences were statistically significant at p < .05. Results for each academic year are examined and interpreted in the next section for a focused perspective on rhythm score change. Table 21 Wilcoxon Signed Rank Test Results (Rhythm) Correlations of Fall/Spring composite scores of all academic years were moderate or strong and statistically significant. No significant mean difference of Fall/Spring composite scores was found for any of the academic years considered. Composite findings of each academic year are displayed and interpreted in the following section, in order to scrutinize score change in detail. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 124 Table 22 Wilcoxon Signed Rank Test Results (Composite) 2007–2008 Grade 3 Scores. Sample sizes of 2007–2008 Grade 3 tonal, rhythm, and composite tests were slightly different due to the number of student absences on each of the dates of test administration. Therefore, missing values from each tests were imputed using predictive mean matching with 10 imputations (Acock, 2005), in which values were selected randomly from a donor pool of observed values and substituted for missing values (Allison, 2015). This resulted in samples of equal size. The imputed data sets, 10 for each original data set of tonal, rhythm, and composite scores, were used for all subsequent statistical tests, which yielded pooled results from which a single inference was drawn (McKnight et al., 2007). Mean scores of all tonal, rhythm, and STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 125 composite tests were comparable to corresponding Fall and Spring administrations, as displayed in Table 23. Table 23 2007–2008 Grade 3 Descriptive Statistics (Pooled) A linear relationship was suggested from visual examination of scatterplots for all corresponding subtests. Therefore, a Pearson’s Product-Moment correlation test was run for each pair of corresponding subtests. Corresponding Fall and Spring scores were significantly correlated (p < .05) for all subtests. The results of the bivariate relationships among all combinations of test types (Huck, 2012, p. 50) was summarized in a correlation matrix, presented in Table 24. In addition, tonal and rhythm subtest intercorrelations ranged from .497 to .636, which were lower than the corresponding reliabilities of the tests: the subtests seemed to have no more than 40% of their variances in common. The reported intercorrelation coefficients of IMMA tonal and rhythm subtests ranged from .40 to .46 (Gordon, 1986c, p. 94); thus, the ability of the tonal subtest and rhythm subtest to measure unique dimensions of music aptitude was somewhat higher for this 2007–2008 Grade 3 sample when compared to the IMMA standardization sample. These results were interpreted as indirect evidence the preponderance of variance of tonal and rhythm subtests was related to factors not shared by the two subtests (p. 94): the tonal subtest and rhythm subtest functioned according to the standard established in the STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 126 IMMA manual for this study’s 2007–2008 Grade 3 sample. As expected, the composite scores were highly correlated with both tonal subtest (ranging from .643 to .840) and rhythm subtest (ranging from .599 to .888) scores, as tonal and rhythm subtest scores contributed to the composite score. Table 24 2007–2008 Grade 3 Correlation Matrix (Pooled) From statistically significant Shapiro-Wilk results, exhibited in Table 25, a violation of the necessary assumption of normal distribution for 2007–2008 Grade 3 tonal and composite scores was concluded. The lower bound of the true significance value is an indication of the low end of the range of possible p-values to which the estimated p-value belongs; it is, in effect, a confidence interval for significance of Kolmogorov-Smirnov results. A visual examination of associated Q-Q plots and boxplots supported the contention of assumptions violations. Therefore, a series of Wilcoxon Signed Rank tests was conducted to investigate differences between corresponding Fall and Spring subtest scores for two related samples. Results are displayed below in Tables 26 and 27. This nonparametric test was selected because the assumption of normality had been violated for most subtests, an indication that use of a STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 127 paired t-test was inappropriate. Unlike the paired t-test, which uses mean scores as the average, signed ranks were used to test the difference of observations. All Spring subtest ranks were higher than corresponding Fall ranks, as determined by the greater number of ranks (Field, 2009, p. 558). Although tonal medians were similar (Mdn = 34.00), Spring tonal subtest means (33.36) tended to be higher than Fall tonal subtest means (32.95), Z= -1.172, r = 0.18, a small effect. Table 25 2007–2008 Grade 3 Shapiro-Wilk Test of Normality Results Spring rhythm subtest ranks (Mdn = 30.00) were apt to be higher than Fall rhythm subtest ranks (Mdn = 29.00), Z = -.329, r = 0.005, and Spring composite ranks (Mdn = 64.00) also trended higher than Fall composite ranks (Mdn =63.00), Z= -.283, r = 0.004. No mean rank differences were statistically significant (p > .05) for any set of ranks. Thus, no effect of instruction was concluded for Fall and Spring tonal scores, rhythm scores, or composite scores for this 2007– 2008 sample. However, it was possible type of instruction adversely affected the finding of no effect of instruction. Gordon (2001b) theorized three stages of preparatory audiation: acculturation, in which children absorb musical sounds, babble, and move with increasing purpose in relation to sounds of their musical environment; imitation, in which children become aware of sameness and STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION Table 26 2007–2008 Grade 3 Wilcoxon Signed Rank Test Results (Pooled) Table 27 2007–2008 Grade 3 Wilcoxon Signed Rank Test Statisticsa (Pooled) 128 STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 129 difference between their own performance and that of others and begin to imitate sounds of their musical environment; and assimilation, in which children recognize and learn to coordinate their singing, chanting, breathing, and moving (pp. 6–10). Structured and unstructured informal guidance offered students in preparatory audiation opportunities to absorb music and respond as they wish (Gordon, 2012, pp. 253–254): “random and experimental responses prepare them [children] to sing, chant, and move within a culturally based context through audiation” (Gordon, 2012, p. 254). However, if students had not experienced adequate informal guidance, they would lack the necessary readiness for formal instruction. Gordon (2012) asserted, “Extended informal guidance in music is more beneficial than premature formal instruction” (p. 255): informal music guidance is foundational to and directly influences developmental music aptitude and, by extension, stabilized music aptitude (Gordon, 2006). Ideally, children begin to move through the steps of preparatory audiation from a young age; however, not all students have transitioned from preparatory audiation or “music babble” by the time they begin formal schooling. Gordon (2013) noted children moved through preparatory audiation at different rates (p. 29) and may or may not have emerged from music babble when in the developmental music aptitude stage (Gordon, 2012, p. 251). Consequently, if students in the current sample had not emerged from the music babble stage yet, the formal guidance provided would have been inappropriate for their musical needs and their music aptitude test scores would have decreased accordingly (Reese & Shouldice, 2019). Gordon (1981) exhorted, Retroactive inhibition on the part of the young child in attempting to erase supposedly erroneous concepts rather than learning how to assimilate them into new understanding, as a result of no, or inappropriate formal instruction, may be the most potent cause of low developmental music aptitude among young children (p. 46). STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 130 Thus, it must be considered instruction was not appropriate to support music aptitude, resulting in no effect of instruction. Type and quality of instruction was beyond the purview of this study; nevertheless, further examination is recommended as a topic of future research. 2008–2009 Grade 3 Scores. Due to student absences on test administration dates, sample sizes for 2008–2009 Grade 3 tests were unequal. Predictive mean matching was used to impute missing values and the imputed data set used for subsequent statistical tests, culminating in equal sample sizes and pooled results. Mean Fall and Spring scores were comparable for all tests, as shown in Table 28. Linearity of corresponding Fall and Spring scores was suggested by a visual examination of tonal, rhythm, and composite scatterplots. Therefore, a Pearson’s Product-Moment correlation test was conducted; results are featured in Table 29. Fall and Spring scores for all tests were significantly correlated (tonal r = .780; rhythm r = .614; composite r = .840) and the effect sizes large. All intercorrelations were also significant (p < .01) and ranged from .490 to .722, suggesting subtests had no more than 52% of their variances in common. As predicted, composite score intercorrelations were strong: tonal–rhythm coefficients ranged from .729 to .926 and rhythm–composite coefficients from .679 to .920. Table 28 2008–2009 Grade 3 Descriptive Statistics (Pooled) STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 131 A Shapiro-Wilk Test of Normality was conducted: all tests but Grade 3 Spring rhythm were significant at p < .05, as presented in Table 30. A determination that the assumption of normality had been violated for most groups of scores was supported by a visual examination of boxplots and Normal Q-Q plots. Thus, the paired t-test was deemed inappropriate and the Wilcoxon Signed Rank test substituted. These findings are displayed in Tables 31 and 32. Table 29 2008–2009 Grade 3 Correlation Matrix (Pooled) Table 30 2008–2009 Grade 3 Shapiro-Wilk Test of Normality Results STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 132 Although tonal medians were similar (Mdn = 34.00), Spring tonal subtest means (33.11) tended to be higher than Fall tonal subtest means (32.88), Z = -1.323, r = .013, a small effect, Spring rhythm subtest ranks (Mdn = 29.00) lower than Fall rhythm subtest ranks (Mdn = 30.00), Z = -.034, r = .0003, and Spring composite subtest ranks (Mdn = 62.00) lower than Fall rhythm subtest ranks (Mdn = 64.00), Z = -.622, r = .006. The mean difference was not significant (p > .05) for any set of ranks. Table 31 2008–2009 Grade 3 Wilcoxon Signed Rank Test Results (Pooled) STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 133 Mean Fall ranks were greater than mean Spring ranks for rhythm and composite scores; however, no difference was statistically significant. This finding reflected Gordon’s (2002) assertion that no school music instruction could yield a decrease in average developmental music aptitude test scores. For this score decrease to have occurred after one year of instruction was suggestive that the formal rhythm instruction offered the students in this 2008–2009 Grade 3 sample was neither compensatory (students’ musical needs were mitigated) nor complementary, (students’ current musical needs were met). Gordon (1986c) recommended an evaluation of, and likely a change to, the type of instruction provided as the result of a score decrease (p. 76). An effect of instruction was not concluded for this sample, although it seemed probable inadequate instruction played a role in this finding. Table 32 2008–2009 Grade 3 Wilcoxon Signed Rank Test Statisticsa (Pooled) 2009–2010 Grade 3 Scores. Sample size differed for each subtest, according to student attendance. Predictive mean matching was used to impute missing values; subsequent statistical tests were based on the imputed data set. Therefore, equal sample sizes and pooled results are presented. Fall mean scores were quite similar to corresponding Spring scores, as can be seen in Table 33. A linear relationship between corresponding Fall and Spring test scores was suggested by a visual examination of tonal, rhythm, and composite scatterplots. Consequently, a Pearson’s STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 134 Product-Moment correlation test was conducted to determine the strength, direction, and magnitude of the relationship between corresponding Fall and Spring IMMA scores. The correlation coefficients of all pairs of subtest scores are featured in Table 34. Fall and Spring scores were significantly correlated (p < .01) for all subtests (tonal r = .669; rhythm r = .654; composite r = .800) and the effect sizes large. Subtest intercorrelations were also significant: tonal–rhythm intercorrelations ranged from .432 to .641: subtests seemed to have no more than 41% of their variances in common. Thus, the ability of the tonal subtest and rhythm subtest to measure unique dimensions of music aptitude was moderate. As expected, intercorrelations with composite scores were strong and significant: tonal–composite intercorrelations ranged from .690 to .908 and rhythm–composite intercorrelations ranged from .664 to .903. Table 33 2009–2010 Grade 3 Descriptive Statistics (Pooled) The results of the Shapiro-Wilk Test of Normality, presented in Table 35, were statistically significant for all tests, p < .05; thus, the use of paired t-tests was deemed inappropriate due the violation of the normality assumption. A visual examination of Normal QQ plots and boxplots of each subtest confirmed the presence of occasional outliers. Therefore, a series of Wilcoxon Signed Rank tests was conducted in lieu of paired t-tests. The results are exhibited below in Tables 36 and 37. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 135 Table 34 2009–2010 Grade 3 Correlation Matrix (Pooled) Table 35 2009–2010 Grade 3 Shapiro-Wilk Test of Normality Results Although tonal medians were similar (Mdn = 33.00), Spring tonal subtest means (32.52) were inclined to be lower than Fall tonal subtest means (32.67) Z = -.255, r = 0.003 and Spring rhythm subtest ranks (Mdn = 29.00) lower than Fall rhythm subtest ranks (Mdn = 30.00), Z = 1.158, r = 0.014. Composite medians were also similar (Mdn = 63.00), but Spring composite means (61.67) tended to be lower than Fall composite means (62.33, Z= -.031, r = 0.0004). STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 136 Table 36 2009–2010 Grade 3 Wilcoxon Signed Rank Test Results (Pooled) Nevertheless, mean differences were not statistically significant (p > .05) for any set of ranks. From non-significant mean differences of Fall and Spring tonal, rhythm, and composite scores, no effect of instruction was concluded. This finding was in contrast to Gordon’s (2002) contention that students would demonstrate an improvement of approximately 2 points per year on a developmental music aptitude test if traditional instruction was offered. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 137 Type and quality of instruction, if inappropriate based on the tonal and rhythm progress of students through the music babble stage, could have influenced students’ ability to benefit from the formal instruction offered in the school music environment. Gordon (2002) suggested average scores would increase to the highest score obtainable with specialized instruction emphasizing audiation. Whereas complementary or compensatory instruction likely would result in an increase or maintenance of test scores, a sustained period of tonal and rhythm instruction mismatched with students’ musical needs and providing insufficient support for students to move through preparatory audiation could have resulted in decreased or stagnant music aptitude test scores, from which a conclusion of no effect of instruction could also be drawn. Table 37 2009–2010 Grade 3 Wilcoxon Signed Rank Test Statisticsa (Pooled) 2010–2011 Grade 3 Scores. Original sample sizes were unequal due to student absences on dates of various test administrations. Missing values were imputed using predictive mean matching; thus, sample sizes of the newly imputed data set, used for all statistical tests, were made equal. Mean scores of corresponding Fall and Spring test administrations were very similar, as shown in Table 38. A linear relationship of Fall and Spring scores was concluded from visual examination of tonal, rhythm, and composite scatterplots and a Pearson’s Product-Moment correlation test conducted. This correlation matrix can be seen in Table 39. There was a large effect for all STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 138 correlations of tonal scores (r = .759), rhythm scores (r = .650), and composite scores (r = .758); all correlations were significant (p < .01). Tonal–rhythm intercorrelations ranged from .179 to .668 and were mostly significant (p < .05). Composite intercorrelations were statistically significant for tonal scores, ranging from .578 to .881, and rhythm scores, ranging from .580 to .887 (p < .05); all effect sizes were large. Table 38 2010–2011 Grade 3 Descriptive Statistics (Pooled) Table 39 2010–2011 Grade 3 Correlation Matrix (Pooled) STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 139 Shapiro-Wilk test results, exhibited in Table 40, were statistically significant for all but Grade 3 Spring rhythm scores, and composite scores were suggestive of a violation of the assumption of normality. The presence of occasional outliers was confirmed by a visual examination of boxplots and Normal Q-Q plots. Therefore, it was concluded paired t-tests were inappropriate and Wilcoxon Signed Rank tests were substituted (see Tables 41 and 42). Table 40 2010–2011 Grade 3 Shapiro-Wilk Test of Normality Results Although tonal and rhythm medians were similar (Mdn = 34.00 and 30.00, respectively), Spring subtest means were inclined to be lower than their Fall counterparts, although not significant: Spring tonal means (33.58) and Fall tonal means (33.70), Z = -1.166, r = .013, and Spring rhythm means (29.11) and Fall rhythm means (29.22), Z = -.928, r = .011. Spring composite ranks (Mdn = 64.00) were apt to be higher than Fall composite ranks (Mdn = 63.00), Z = -1.953, r = .022. The composite result also was not statistically significant, although the observed p-value (p = .051) was almost equal to the a priori p-value (p = .05). Nevertheless, practical significance was discounted because the effect size was quite small (r = .022). No effect of instruction was concluded from the findings because mean score differences were not statistically significant. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION Table 41 2010–2011 Grade 3 Wilcoxon Signed Rank Test Results (Pooled) Table 42 2010–2011 Grade 3 Wilcoxon Signed Rank Test Statisticsa (Pooled) 140 STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 141 2011–2012 Grade 3 Scores. As with previous test administrations, sample size differed for each subtest. However, imputation of missing values created equal sample sizes; pooled results are presented. Mean scores for corresponding subtests were comparable from the Fall to Spring administrations, as seen in Table 43. An approximate linear relationship between corresponding Fall and Spring scores was suggested from a visual examination of tonal, rhythm, and composite scatterplots. Table 43 2011–2012 Grade 3 Descriptive Statistics (Pooled) Results of a Pearson’s Product-Moment correlation test are presented in Table 44. Fall and Spring scores were significantly correlated (p < .01) for all subtests (tonal r = .688; rhythm r = .551, composite r = .699) and the effect sizes large. The intercorrelation between tonal and rhythm subtest scores also was significant, with the exception of Grade 3 Fall tonal and Grade 3 Spring rhythm scores. Correlation coefficients ranged from .504 to .701; subtests seemed to have no more than 49% of their variances in common. Therefore, tonal and rhythm subtests estimated unique dimensions of music aptitude with modest success. As anticipated, intercorrelations with composite scores were significant: tonal–composite correlation coefficients ranged from .635 to .907; rhythm–composite correlation coefficients ranged from .599 to .928. The effect size of all intercorrelations was large. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 142 Table 44 2011–2012 Grade 3 Correlation Matrix (Pooled) A Shapiro-Wilk Test of Normality was conducted; results for most tests were statistically significant at p < .05, with the exception of Grade 3 Spring rhythm scores (see Table 45). A visual examination of Normal Q-Q plots and boxplots for each subtest offered additional evidence the assumption of normal distribution necessary to conduct a paired t-test had been violated. Therefore, the Wilcoxon Signed Rank test was conducted to investigate mean difference for each pair of corresponding subtests; results are featured in Tables 46 and 47. Spring tonal subtest ranks (Mdn = 34.00) were significantly higher (p = .001) than Fall tonal subtest ranks (Mdn = 32.00), Z= -3.270, r = 0.037, a small effect size. It was suggested by this result that instruction might have had a nominal effect on tonal scores, as the mean Spring score was 1.18 points higher than the Fall score. However, caution is recommended when considering practical significance, as Gordon (2002) had suggested developmental music aptitude scores increased approximately 2 points from year-to-year and the tonal score increase in question did not meet that threshold. Spring rhythm subtest ranks (Mdn = 29.00) tended to be lower than Fall rhythm subtest ranks (Mdn = 31.00), Z = -.831, r = 0.009; although composite medians were STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 143 similar (Mdn = 63.00), Spring composite means (60.94) trended higher than Fall composite means (60.46), Z= -1.1326 r = 0.015. Nevertheless, no statistically significant difference in rhythm ranks or composite ranks (p > .05) was found. From these non-significant mean score differences, it was concluded instruction did not affect Fall and Spring rhythm or composite scores. It was possible this sample of students had progressed beyond the tonal music babble stage and consequently the formal instruction offered within the school music environment was appropriate for their tonal development but not their rhythm development. However, the decrease in rhythm ranks after one year of instruction was unexpected and indicative of instruction that was neither compensatory nor complementary of students’ musical needs (Gordon, 1986c, p. 76). Type and quality of instruction in general, and by tonal and rhythm dimension specifically, were not considered in the current study; however, it becomes increasingly apparent as analyses are interpreted that an examination of this topic is recommended in future studies. Table 45 2011–2012 Grade 3 Shapiro-Wilk Test of Normality Results 2012–2013 Grade 3 Scores. Again, sample sizes differed by subtest, according to student attendance on the dates of test administration. Missing values were imputed using predictive mean matching and the STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 144 imputed data set used to conduct statistical tests. Corresponding Fall and Spring scores were quite similar for all subtests, as seen in Table 48. Table 46 2011–2012 Grade 3 Wilcoxon Signed Rank Test Results (Pooled) A broad approximation of a linear relationship for Fall and Spring administrations was suggested from a visual examination of scatterplots of tonal, rhythm, and composite scores. A Pearson’s Product-Moment correlation test for each pair of corresponding subtests yielded correlation coefficients for Fall and Spring administrations as follows: tonal r = .729, rhythm r = STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 145 .568, and composite r = .727. Thus, relationships between Fall and Spring scores of corresponding subtests were strong, positive, and statistically significant at p < .01, as exhibited in the correlation matrix shown in Table 49. Table 47 2011–2012 Grade 3 Wilcoxon Signed Rank Test Statisticsa (Pooled) Table 48 2012–2013 Grade 3 Descriptive Statistics (Pooled) The intercorrelation between tonal and rhythm subtests was moderate, ranging from .378 to .590. Stronger correlations were found for relationships with composite scores: tonal– composite correlation coefficients ranged from .644 to .872; rhythm–composite correlation coefficients ranged from .579 to .907. This was to be expected, as tonal and rhythm scores were summed to yield composite scores. A Shapiro-Wilk Test of Normality yielded statistically significant results for all tests at p < .05, as presented in Table 50, thus indicating a violation of the assumption of normality. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 146 Additional evidence supporting the violation of the normality assumption, attributed to the presence of outliers, was provided by a visual examination of Q-Q plots and boxplots. Table 49 2012–2013 Grade 3 Correlation Matrix (Pooled) Table 50 2012–2013 Grade 3 Shapiro-Wilk Test of Normality Results The Wilcoxon Signed Rank test was used in lieu of the paired t-test, which required an assumption of normal distribution. Although tonal and rhythm medians were similar (Mdn = 34.00 and 30.00, respectively), Spring tonal subtest means (33.27) were apt to be higher (p > .05) STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 147 than Fall tonal subtest means (32.91), Z= -.565, r = 0.007 and Spring rhythm subtest means (29.46) higher than Fall rhythm subtest means (29.25), Z = -.540, r = 0.007. Spring composite ranks (Mdn = 65.00) tended to be higher than Fall composite ranks (Mdn = 63.00), Z= -.647, r = 0.008.The difference was not statistically significant (p > .05) for any set of ranks; results can be seen in Tables 51 and 52. From the non-significant results, it was concluded there was no influence of instruction on tonal, rhythm, or composite music aptitude for this sample. Type and quality of instruction was not investigated within this study’s design and might have had an effect on music aptitude scores, as instruction inappropriately aligned with students’ musical needs might have resulted in the non-significant score differences found in the current study. This is further evidence additional research on type and quality of instruction should be investigated in future studies. 2013–2014 Grade 3 Scores. Due to irregularities in student attendance, original sample size differed by subtest, as displayed in Table 53. Missing values were imputed using predictive mean matching; Fall and Spring mean scores were found to be comparable. Subtest scatterplots broadly approximated linear relationships of Fall and Spring scores. Consequently, a Pearson’s Product-Moment correlation test was conducted for each corresponding pair of subtest scores. The results are exhibited in Table 54. Correlations of corresponding tonal (r = .777), rhythm (r = .520), and composite (r = .723) scores were statistically significant at p < .01 and effect sizes large. The tonal–rhythm subtest intercorrelation coefficients ranged from .355 to .628. Correlations of composite scores were strong, ranging from .638 to .880 for tonal–composite intercorrelations and .495 to .914 for rhythm–composite STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 148 intercorrelations, as would be expected when tonal and rhythm scores were included in composite scores. A Shapiro-Wilk Test of Normality yielded statistically significant results at p < .05, an indication of a violation of the normality assumption required for the paired t-test; results are displayed in Table 55. A visual examination of Normal Q-Q plots and boxplots confirmed the suspicion of the normality assumption violation, likely due to the presence of outliers. Table 51 2012–2013 Grade 3 Wilcoxon Signed Rank Test Results (Pooled) STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION Table 52 2012–2013 Grade 3 Wilcoxon Signed Rank Test Statisticsa (Pooled) Table 53 2013–2014 Grade 3 Descriptive Statistics (Pooled) Table 54 2013–2014 Grade 3 Correlation Matrix (Pooled) 149 STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 150 Table 55 2013–2014 Grade 3 Shapiro-Wilk Test of Normality Results In lieu of paired t-tests, a series of Wilcoxon Signed Rank tests was conducted on all pairs of subtests. Results are featured in Tables 56 and 57. Spring tonal subtest ranks (Mdn = 34.00) were significantly higher (p = .001) than Fall tonal subtest ranks (Mdn = 33.00), Z= 3.234; r = 0.41, a moderate effect. Although composite medians were similar (Mdn = 63.00), Spring means (62.35) were apt to be higher than Fall composite means (61.50), Z= -1.099, r = 0.0014. Rhythm medians were also similar (Mdn = 30.00), yet Spring rhythm subtest means (28.95) tended to be lower than Fall rhythm subtest means (29.26), Z = -.165, r = 0.002. The mean difference was not statistically significant (p > .05) for rhythm or composite ranks. Gordon (1986c) observed each child’s tonal scores often differed from their rhythm scores; therefore, it was unsurprising the tonal and rhythm findings for this sample were also dissimilar. Whether the difference was meaningful or caused by error of measurement was unknown, as Gordon (1986c) noted (p. 67). The reason Fall–Spring tonal scores differed significantly for this particular sample of students was also unknown, although it could be conjectured the significant increase in scores was an indication tonal instruction had been compensatory (Reese & Shouldice, 2019). Thus, an effect of instruction was speculated for tonal music aptitude for the 2013–2014 Grade 3 sample; STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 151 however, no effect of instruction was concluded for rhythm or composite music aptitude. It was possible the formal rhythm instruction offered within the school environment might have been inadequate for students who remained in the rhythm babble stage, resulting in the lack of rhythm and composite score fluctuation seen in the findings of the current study. Table 56 2013–2014 Grade 3 Wilcoxon Signed Rank Test Results (Pooled) STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 152 Table 57 2013–2014 Grade 3 Wilcoxon Signed Rank Test Statisticsa (Pooled) 2014–2015 Grade 3 Scores. Imputation of missing values using predictive mean matching resulted in equal sample sizes. All statistical tests were conducted from the imputed data set. Mean scores of corresponding Fall and Spring tests were comparable as shown in Table 58. Table 58 2014–2015 Grade 3 Descriptive Statistics (Pooled) A Pearson’s Product-Moment correlation test was conducted on all scores; the tonal correlation (r = .795), rhythm correlation (r = .694), and composite correlation (r = .828) were significant (p < .01) and effect sizes large. Results are displayed in Table 59. Tonal–rhythm intercorrelations also were significant; coefficients ranged from .423 to .685. Tonal–composite intercorrelations, ranging from .726 to .910, and rhythm–composite intercorrelations, ranging from .668 to .903), demonstrated a stronger association, as was anticipated due to tonal scores STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 153 and rhythm scores comprising composite scores. The absolute value of r was interpreted as a small effect size (0.1), medium effect size (0.3), or large effect size (0.5). Results of a Shapiro-Wilk Test of Normality were predominantly significant (p < .05), with the exception of rhythm scores and Spring composite scores (see Table 60). It was determined the assumption of normality had been violated; Wilcoxon Signed Rank tests were conducted in lieu of paired t-tests and the results exhibited in Tables 61 and 62. Table 59 2014–2015 Grade 3 Correlation Matrix (Pooled) Table 60 2014–2015 Grade 3 Shapiro-Wilk Test of Normality Results STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 154 Although Fall and Spring tonal medians were similar (Mdn = 34.00 and 33.00, respectively), Spring tonal means (32.16) were inclined to exceed Fall tonal means (31.78), Z = 1.729, r = .024. Composite medians were also similar (Mdn = 61.00), yet Spring composite means (60.31) were apt to be slightly lower than Fall composite means (60.43), Z = -.091, r = .001. Spring rhythm ranks (Mdn = 28.50) trended lower than Fall rhythm ranks (Mdn = 29.00), Z = -.345, r = .005. No mean differences were statistically significant (p > .05). Spring scores might be similar to or exceed Fall scores if instruction had been compensatory (students’ musical needs were attenuated), complementary (students’ musical needs were satisfied), or both (Gordon, 1986c, p. 76). Although no effect of instruction was concluded from the findings of the 2014–2015 Grade 3 sample, it was possible the decrease in rhythm scores after one year of instruction might be indicative of inadequate type or quality of instruction for this sample of students. If students had not yet transitioned from preparatory audiation, the formal rhythm instruction offered could have been inappropriate for students’ musical needs. 2015–2016 Grade 3 Scores. As anticipated, sample sizes, presented in Table 63, varied by subtest due to student attendance on test administration days. Missing values were imputed using predictive mean matching and the imputed data set used for statistical testing. Fall and Spring mean scores for corresponding subtests were quite similar. A roughly linear relationship of corresponding subtest scores was suggested from a visual examination of scatterplots. Therefore, a Pearson’s Product-Moment correlation test was conducted for all subtests; the correlation matrix may be found in Table 64. Correlations for corresponding tonal (r = .516), rhythm (r = .415), and composite (r = .603) tests were statistically significant at p < .01 and the effect sizes large. The tonal–rhythm intercorrelation coefficients ranged from .249 to .562 STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 155 Composite intercorrelations were moderate to strong: tonal–composite coefficients ranged from .524 to .885; rhythm–composite coefficients ranged from .423 to .866. The absolute value of r was interpreted as a small effect size (0.1), medium effect size (0.3), or large effect size (0.5). Table 61 2014–2015 Grade 3 Wilcoxon Signed Rank Test Results (Pooled) Results of a Shapiro-Wilk Test of Normality were statistically significant (p < .05) for Grade 3 Spring tonal scores and Grade 3 Fall composite, as displayed in Table 65; however, Fall tonal, Spring composite, and all rhythm scores were not statistically significant (p > .05). Thus, a STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 156 violation of the normality assumption for most tests was suggested from these results and the use of paired t-tests was deemed inappropriate. Table 62 2014–2015 Grade 3 Wilcoxon Signed Rank Test Statisticsa (Pooled) Table 63 2015–2016 Grade 3 Descriptive Statistics (Pooled) Therefore, a nonparametric Wilcoxon Signed Rank test was conducted for corresponding pairs of subtest scores. Results are presented below in Tables 66 and 67. Although tonal and rhythm medians were similar (Mdn = 32.00 and 28.00, respectively), Spring tonal subtest means (31.18) were apt to be lower than Fall tonal subtest means (31.26), Z= -.796, r = 0.010 and Spring rhythm means (27.71) higher than Fall rhythm means (27.62), Z = -.689, r = 0.009. Spring composite ranks (Mdn = 60.00) were inclined to be higher than Fall composite ranks (Mdn =59.00), Z= -.889, r = 0.011. The mean difference was not statistically significant (p > .05) for tonal, rhythm, or composite ranks, and no effect of instruction was found for this sample. It STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 157 was possible students from this sample might have remained in preparatory audiation; if so, the formal rhythm instruction offered likely was inappropriate for their musical needs. Gordon (1986c) asserted complementary or compensatory instruction would maintain or increase scores; therefore, it was possible the lack of complementary or compensatory instruction resulted in the tonal score decrease observed in this sample. Although type and quality of instruction was not investigated in the current study, it is suggested this topic be examined in future studies. Table 64 2015–2016 Grade 3 Correlation Matrix (Pooled) Table 65 2015–2016 Grade 3 Shapiro-Wilk Test of Normality Results STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 158 2016–2017 Grade 3 Scores. As with tests from previous years, sample size differed by subtest due to student absences on dates of test administration, as evidenced in Table 68. Predictive mean matching was used to impute missing data; the imputed data set was used to conduct all statistical tests. Corresponding Fall and Spring means were comparable. An approximate linear relationship between Fall and Spring scores was suggested by a visual examination of subtest scatterplots. Table 66 2015–2016 Grade 3 Wilcoxon Signed Rank Test Results (Pooled) STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 159 A Pearson’s Product-Moment correlation test was conducted for all subtests; results are presented as a correlation matrix in Table 69. Statistically significant (p < .01) correlations were interpreted for tonal (r = .571), rhythm (r = .507), and composite (r = .575) scores. Tonal– rhythm intercorrelation coefficients ranged from .413 to .554. Composite correlations were similarly strong: tonal–composite correlation coefficients ranged from .542 to .887; rhythm– composite correlation coefficients ranged from .437 to .818. The absolute value of r was interpreted as a small effect size (0.1), medium effect size (0.3), or large effect size (0.5). A violation of the assumption of normality necessary for conducting paired t-tests was concluded from statistically significant Shapiro-Wilk Test of Normality results (p < .05) for tonal and composite scores. Rhythm results were not statistically significant (p > .05). Results are illustrated in Table 70. Table 67 2015–2016 Grade 3 Wilcoxon Signed Rank Test Statisticsa (Pooled) Table 68 2016–2017 Grade 3 Descriptive Statistics (Pooled) STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 160 Table 69 2016–2017 Grade 3 Correlation Matrix (Pooled) Table 70 2016–2017 Grade 3 Shapiro-Wilk Test of Normality Results The contention the assumption of normality had been violated was supported by a visual examination of Q-Q plots and boxplots of corresponding Fall and Spring scores. Consequently, a nonparametric Wilcoxon Signed Rank test was conducted for all scores to investigate the difference in scores of paired samples. Results are featured in Tables 71 and 72. Spring tonal subtest ranks (Mdn = 33.00) tended to be higher than Fall tonal subtest ranks (Mdn = 32.00), Z= - STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 161 .595, r = 0.007. Although rhythm and composite medians were similar (Mdn = 29.00 and 60.00, respectively), Spring rhythm means (27.89) were inclined to be slightly higher than Fall rhythm means (27.87), Z = -.255, r = 0.003, and Spring composite means (58.69) slightly higher than Fall composite means (58.63), Z= -.416, r = 0.005. The difference was not statistically significant (p > .05) for any set of ranks. It appeared there was no effect of instruction for 2016– 2017 Grade 3 scores. However, an investigation of effect of type and quality of instruction was recommended for future research, as it was possible the formal rhythm instruction offered in the school environment might not have been appropriate for students who had not yet transitioned from preparatory audiation, which in turn affected composite scores. 2017–2018 Grade 3 Scores. As in all previous years, Grade 3 sample sizes in 2017–2018 differed according to student attendance on test administration days, as displayed in Table 73. Missing values were imputed using predictive mean matching and the imputed data set used to calculate all statistical tests. From a visual examination of tonal, rhythm, and composite score scatterplots, a rough approximation of linearity for corresponding Fall and Spring subtest scores was conjectured. Therefore, a Pearson’s Product-Moment correlation test was conducted; results are displayed in the correlation matrix in Table 74. Correlations for tonal (r = .743), rhythm (r = .381) and composite (r = .410) scores were statistically significant (p < .05). Tonal–rhythm intercorrelations ranged from .315 to .444. Tonal–composite intercorrelations were strong, ranging from .627 to .861; rhythm–composite intercorrelations were more widely dispersed, from .404 to .858. The absolute value of r was interpreted as a small effect size (0.1), medium effect size (0.3), or large effect size (0.5). STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION Table 71 2016–2017 Grade 3 Wilcoxon Signed Rank Test Results (Pooled) Table 72 2016–2017 Grade 3 Wilcoxon Signed Rank Test Statisticsa (Pooled) 162 STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 163 Table 73 2017–2018 Grade 3 Descriptive Statistics (Pooled) A violation of the assumption of normality necessary to conduct a paired t-test was revealed by a significant result for Spring composite scores (p < .05) on the Shapiro-Wilk Test of Normality (see Table 75). However, no other Shapiro-Wilk test results were statistically significant at p < .05. Additional evidence to support the assumption of normality for most subtests was provided through a visual examination of Q-Q plots and boxplots for corresponding Fall and Spring tonal, rhythm, and composite scores. Paired t-tests were conducted to examine mean differences in IMMA tonal, rhythm, and composite scores of corresponding Fall and Spring test administrations. Results are displayed in Table 76. No significant difference in mean Fall and Spring tonal scores (t(3876) = -.914, p >.05), rhythm scores (t(4084) = 1.874, p > .05), or composite scores (t(4240) = 1.874 (p > .05), was found for this sample. 2018–2019 Grade 3 Scores. Inconsistent student attendance affected subtest sample size once again. Predictive mean matching was used to impute missing values and all statistical tests were based on the imputed data set. Mean scores for corresponding tonal, rhythm, and composite scores were comparable, as in previous years. Details are displayed in Table 77. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 164 Table 74 2017–2018 Grade 3 Correlation Matrix (Pooled) Table 75 2017–2018 Grade 3 Shapiro-Wilk Test of Normality Results A rough approximation of linearity for Fall and Spring scores was suggested by a visual examination of scatterplots of tonal scores, rhythm scores, and composite scores. Therefore, a Pearson’s Product-Moment Correlation test was conducted to estimate the relationships between all subtest scores. Results are illustrated in the correlation matrix in Table 78. Statistically significant (p < .01) correlations were found for tonal (r = .752), rhythm (r = .603), and STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 165 composite (r = .776) scores. The intercorrelation of tonal and rhythm scores was also significant (p < .01) and ranged from .444 to .662. Intercorrelations with composite scores were statistically significant (p < .01): tonal–composite correlation coefficients ranged from .718 to .913, rhythm– composite correlation coefficients ranged from .595 to .887. The absolute value of r was interpreted as a small effect size (0.1), medium effect size (0.3), or large effect size (0.5). Table 76 2017–2018 Grade 3 Paired t-Test Results (Pooled) Table 77 2018–2019 Grade 3 Descriptive Statistics (Pooled) Results of a Shapiro-Wilk Test of Normality, presented in Table 79, were significant (p < .05) for half of the subtests (Grade 3 Fall tonal, Grade 3 Spring tonal, and Grade 3 Spring composite), indicating a violation of the assumption of normality. However, Shapiro-Wilk results for Grade 3 Fall rhythm, Grade 3 Fall composite, and Grade 3 Spring rhythm subtest scores were STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 166 not statistically significant (p > .05), suggesting an approximation of a normal distribution. The contention the assumption of normality had been violated was supported by a visual examination of Normal Q-Q plots and boxplots of Fall and Spring tonal, rhythm, and composite scores. A linear relationship was approximated in the Normal Q-Q plots of Fall and Spring rhythm scores before and after imputation, with no outliers indicated in SPSS-generated boxplots. Table 78 2018–2019 Grade 3 Correlation Matrix (Pooled) As a result of the normality assumption violation, paired t-tests were deemed inappropriate. Instead, a Wilcoxon Signed Rank test was conducted to examine differences in mean scores for corresponding Fall and Spring subtests and the results featured in Tables 80 and 81. Spring tonal subtest ranks (Mdn = 33.00) were inclined to be higher than Fall tonal subtest ranks (Mdn = 32.00), Z= -1.815, r = 0.024), and Spring composite ranks (Mdn = 61.00) higher than Fall composite ranks (Mdn = 60.00), Z= -1.425, r = 0.019. Although rhythm medians were similar (Mdn = 28.00), Spring rhythm subtest means (28.17) tended to be higher than Fall rhythm subtest means (27.79), Z = -.994, r = 0.013), No significant difference was estimated for any set of scores (p > .05); therefore, no effect of instruction was concluded for this sample. It STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 167 was possible the tonal, rhythm, and composite score increases were the result of complementary or compensatory instruction for this sample of students (Gordon, 1986c, p. 76); however, the type and quality of instruction was not examined in the current study. Nevertheless, an examination of the effect of type and quality of instruction was recommended for future study. Table 79 2018–2019 Grade 3 Shapiro-Wilk Test of Normality Results 2019–2020 Grade 3 Scores. Student absences on test administration dates produced unequal sample sizes, which were mitigated through imputation of missing values using predictive mean matching. In addition, remote online instruction due to the COVID-19 pandemic precluded a Spring 2020 IMMA test administration; these missing scores were also imputed. The resulting data set was used for all subsequent statistical tests. Mean tonal, rhythm, and composite scores on corresponding Fall and Spring tests were comparable, as exhibited in Table 82. A Pearson’s Product-Moment correlation test was conducted on all scores and results presented in Table 83. The tonal correlation (r = .779), rhythm correlation (r = .677), and composite correlation (r = .820) were statistically significant (p < .01), as were all intercorrelations (p < .05): tonal–rhythm intercorrelations ranged from .379 to .716, tonal– STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 168 composite intercorrelations from .704 to .922, and rhythm–composite intercorrelations from .657 to .918. Composite intercorrelations were expected to be moderate-to-strong, as composite scores are comprised of tonal scores and rhythm scores and are thus related. The absolute value of r was interpreted as a small effect size (0.1), medium effect size (0.3), or large effect size (0.5). Table 80 2018–2019 Grade 3 Wilcoxon Signed Rank Test Results (Pooled) With the exception of Grade 3 Fall rhythm scores, results of a Shapiro-Wilk Test of Normality (see Table 84) were statistically significant (p < .05). Thus, a violation of the STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 169 normality assumption was assumed and paired t-tests deemed inappropriate. Instead, the nonparametric Wilcoxon Signed Rank test was conducted for estimation of mean differences of matched pairs. Results are featured in Tables 85 and 86. Although Fall and Spring tonal medians were similar (Mdn = 33.00 and 32.50, respectively) and Fall and Spring composite medians were identical (Mdn = 60.00), Spring tonal means (31.63) were inclined to exceed Fall tonal means (31.30), Z = -1.280, r = .018 and Spring composite means (58.62) to exceed Fall composite means (58.36), Z = -.082, r = .001. Mean Spring rhythm ranks (Mdn = 28.00) tended to be higher than Fall rhythm ranks (Mdn = 27.00), Z = -.433, r = .006. However, no mean rank difference was statistically significant (p > .05). Although no effect of instruction was hypothesized for students in the stabilized music aptitude stage (Gordon, 2013, p. 15), a score increase was perhaps indicative of instruction well-suited to the needs of this student sample (Gordon, 1986c, p. 76). Examination of the type and quality of instruction was beyond the parameters of the current study; it is recommended they be considered in future studies. Table 81 2018–2019 Grade 3 Wilcoxon Signed Rank Test Statisticsa (Pooled) Synopsis An examination of Wilcoxon Signed Rank test results yielded a longitudinal perspective of Grade 3 music aptitude through scrutiny of Grade 3 IMMA scores by grade level spanning a STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 170 period of 13 years. No effect of instruction on music aptitude was concluded from the minimal score fluctuation in the current study, and achievement of the stabilized music aptitude stage prior to Grade 3 was implied. Wilcoxon Signed Rank tests were used to compare IMMA Fall and Spring scores; summaries of Wilcoxon Signed Rank test results by subtest are represented in Figures 9, 10, and 11. Table 82 2019–2020 Grade 3 Descriptive Statistics (Pooled) Table 83 2019–2020 Grade 3 Correlation Matrix (Pooled) STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 171 Table 84 2019–2020 Grade 3 Shapiro-Wilk Test of Normality Results There was a tendency for Spring tonal ranks to be higher than Fall tonal ranks for all academic years except 2009–2010; Fall rhythm ranks were inclined to exceed Spring rhythm ranks in approximately 60% of the years examined. Spring composite ranks were apt to surpass Fall composite ranks for the majority of cases: it appeared tonal scores had a stronger influence on composite scores for this data set. Nevertheless, significant mean differences were found only for 2011–2012 and 2013–2014 tonal ranks and a medium effect size estimated; mean score differences were not significantly different for most cases. Although tonal scores and composite scores tended to increase from Fall to Spring, mean score differences were generally less than 1 point and non-significant. Gordon (2002) had noted an average yearly developmental music aptitude score increase of 2 points with traditional instruction and optimal score increases when instruction emphasized audiation; score differences in the current study were considerably smaller. Thus, the mean score differences of the current study seemed to indicate either an attainment of the stabilized music aptitude stage prior to Grade 3/age 8 or instruction that was illsuited to the musical needs of the students. It was concluded the relative constancy of tonal scores, rhythm scores, and composite scores did not support an effect of instruction on music STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION Table 85 2019–2020 Grade 3 Wilcoxon Signed Rank Test Results (Pooled) Table 86 2019–2020 Grade 3 Wilcoxon Signed Rank Test Statisticsa (Pooled) 172 STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 173 aptitude. With appropriate informal guidance and formal instruction, children’s birth level of developmental music aptitude might be aspired to; without it, developmental music aptitude would likely decrease (Gordon, 2001b, p. 83). Figure 9 Wilcoxon Signed Rank Test Results (Tonal) It is critical that guidance and instruction of students still in the developmental music aptitude stage is compensatory as well as complementary to the students’ level of music aptitude in order to maintain or increase student music aptitude toward birth levels (Gordon, 1986c, p. 76). Therefore, it must be considered the formal instruction provided to the current study’s STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 174 Figure 10 Wilcoxon Signed Rank Test Results (Rhythm) sample, with the exception of tonal instruction for 2011–2012 and 2013–2014 students, was not sufficient for students’ musical needs, as no effect of instruction was found. Grade 3 students were purported to be in the developmental music aptitude stage and it was expected their scores would continue to fluctuate in response to influence of the musical environment. Lack of significant score change could have indicated instruction was inadequate. Although the effect of type of instruction was beyond the parameters of the current study, further research on this topic is recommended. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 175 Figure 11 Wilcoxon Signed Rank Tests Results (Composite) Research Question 3 Is there evidence to substantiate the transition between the developmental music aptitude stage and stabilized aptitude stage at age 9/Grade 4? One-Way Repeated Measures ANOVA Results In order to examine the longitudinal change in tonal scores, rhythm scores, and composite scores to consider a period of transition as queried in Research Question 3, a series of one-way repeated measures ANOVA tests was conducted using scores from all combinations of available STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 176 Fall and Spring scores of consecutive third-, fourth-, and fifth-grade years. A limitation of the current study was the decision to examine only scores of students who had been administered IMMA for three consecutive years rather than all available data for Grades 3, 4, and 5. Missing scores were imputed using predictive mean matching; the imputed data set was used to conduct all statistical tests. Results of tonal, rhythm, and composite findings are combined, summarized, and presented in Tables 87, 88, and 89. Table 87 Repeated Measures ANOVA Combined Results (Tonal) Significant mean increases in tonal scores from Grade 3 to Grade 5 were found for Groups C and D. For these two cases, tonal scores tended to increase throughout the 3-year period, as illustrated in Table 87. However, mean differences from adjacent grade levels were not significant. A significant increase in tonal scores also was found from Grade 3 to Grade 5 for STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 177 Group A; in addition, tonal scores increased significantly from Grade 3 to Grade 4 for this sample. However, no significant score difference was found for Grade 4 to Grade 5. For this case, tonal score increase seemed to taper after Grade 4. No significant mean differences were found for Groups B, E, and F. From these results, a period of transition was speculated for tonal music aptitude; nevertheless, the findings were not conclusive. As evident from the results exhibited in Table 88, a similar trend of overall growth in rhythm scores from Grade 3 to Grade 5 without significant increase in adjacent grade levels was seen for Group E. No significant score difference was found for Groups A, C, D, or F. An unusual pattern of score fluctuation was interpreted for Group B, in which rhythm scores decreased from Grade 3 to Grade 4, then increased from Grade 4 to Grade 5. From the findings Table 88 Repeated Measures ANOVA Combined Results (Rhythm) STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 178 of Group B, it could be speculated rhythm scores fluctuated between the developmental and stabilized music aptitude stage during the 3-year period in question, as students transitioned from one stage into the next. Nevertheless, a period of transition was not substantiated from the majority of rhythm results. Likely due to the influence of its component tonal and rhythm parts, significant mean differences were noted for composite scores for most grade level groupings, as displayed in Table 89. For Groups A, C, and D, significant score increases greater than 2 points were found from Grade 3 to Grade 4 and from Grade 3 to Grade 5. No significant mean difference was noted from Grade 4 to Grade 5; perhaps score gains tapered as chronological age increased, as Gordon (1981) had asserted. A broad period of transition between the developmental and stabilized music aptitude stages might also account for the inconsistency of score fluctuation. It was conjectured tonal music aptitude had more influence on composite music aptitude for Groups A, C, and D, as rhythm findings for those groups were not significant. A significant score increase was noted from Grade 3 to Grade 5 for Group E; however, no significant mean differences were found for adjacent grades. Again, a period of transition might account for this inconsistency. No significant mean score difference was found for composite scores of Group F. An atypical score trend similar to that found for tonal scores was observed for Group B: composite scores decreased from Grade 3 to Grade 4, then increased from Grade 4 to Grade 5. It was unlikely students were in the stabilized music aptitude stage in Grade 3, as a decrease in score fluctuation might suggest, yet returned to the developmental music aptitude stage in Grade 4, as an increase in score fluctuation might suggest. Therefore, a period of transition was conjectured that might account for this discrepancy in direction of score change and accommodate an ebb and flow between stages of music aptitude. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 179 Table 89 Repeated Measures ANOVA Combined Results (Composite) In the following section, results were categorized, presented, and investigated by 3-year grade level groupings (Grade 3/Grade 4/Grade 5), labeled A-F, and subtest. Descriptive statistics, repeated measures ANOVA findings, multivariate confirmatory test results, and Bonferroni post hoc findings were interpreted to provide a more in-depth examination. Group A: Fall 2016 (Grade 3), Fall 2017 (Grade 4), Fall 2018 (Grade 5) Tonal. Tonal scores were available for students who were administered IMMA in consecutive years during the Fall of their third-, fourth-, and fifth-grade years. Mean tonal scores are exhibited in Table 90. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 180 Table 90 Group A: Fall 2016/Fall 2017/Fall 2018 Descriptive Statistics Pooled Results (Tonal) The results of Mauchly’s Test of Sphericity are displayed in Table 91. It was concluded variances of differences between levels of tonal subtest results for Fall test administrations in Grades 3–5 were significantly different. The assumption of sphericity had been violated (X2(2) = 15.820, p < .05), resulting in an inaccurate F-test (Field, 2009, p. 476). However, application of the Greenhouse-Geisser correction yielded an adjusted F-value and degrees of freedom (F(1.62, 99.04) = 9.136, p = .001, as did the Huynh-Feldt correction (F(1.67, 101.37) = 9.136, p = .001. The result was statistically significant and the effect size moderate (2 = .115) (Hatcher, 2013, p. 370), as evidenced in Table 92. This conclusion was supported by statistically significant multivariate test results, displayed in Table 93: Pillai’s Trace = .193, F(2, 60) = 7.197, p = .002, ηp2 = .193. There were significant differences in mean tonal scores of students in Grades 3, 4, and 5 (Field, 2009, p. 477). The significant results of the Bonferroni post hoc test, displayed as pairwise comparisons in Table 94, might be interpreted as follows: on average, Grade 4 tonal scores were 2.149 points higher than Grade 3 tonal scores (p = .003, 95% CI [-3.644, -.653]) and Grade 5 tonal scores were 2.335 points higher than Grade 3 tonal scores (p < .001, 95% CI [-3.534, -1.136]). The mean difference in Grade 4 and Grade 5 tonal scores was not significant. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 181 Grade 4 and Grade 5 tonal scores were significantly greater than Grade 3 tonal scores for this sample, which created the appearance that musical environment continued to influence tonal music aptitude. However, the mean difference in Grade 4 and Grade 5 tonal scores was not significant; the interpretation that musical environment had ceased to influence music aptitude Table 91 Group A: Fall 2016/Fall 2017/Fall 2018 Mauchly’s Test of Sphericitya Results (Tonal) Table 92 Group A: Fall 2016/Fall 2017/Fall 2018 Tests of Within-Subjects Effects Results (Tonal) STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 182 Table 93 Group A: Fall 2016/Fall 2017/Fall 2018 Multivariate Testa Results (Tonal) Table 94 Group A: Fall 2016/Fall 2017/Fall 2018 Pairwise Comparisons Pooled Results (Tonal) after Grade 4 was supported by this lack of significant score change. It was possible students remained in the developmental tonal aptitude stage through Grade 5 and score difference tapered STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 183 between Grades 4 and 5; Gordon (1981) asserted environmental influence decreased as the student’s chronological age increased. However, it also was possible the type of instruction in Grades 4 and 5 had a limiting effect on growth of developmental tonal aptitude for the current sample. While examination of type and quality of instruction was beyond the parameters of the current study, investigation of the effect of type and quality of instruction on tonal aptitude should be investigated in future studies. It was speculated a broad period of transition from Grade 3 through Grade 5 could explain the discrepancy in tonal score fluctuation. Rhythm. Mean rhythm scores for Fall 2016 (Grade 3), Fall 2017 (Grade 4), and Fall 2018 (Grade 5) are featured in Table 95. From the results of Mauchly’s Test of Sphericity, presented as Table 96, it was concluded variances of differences between levels of rhythm subtest results for Fall test administrations in Grades 3–5 were not considered significantly different and the condition of sphericity had been met (X2(2) = 2.162, p > .05). Table 95 Group A: Fall 2016/Fall 2017/Fall 2018 Descriptive Statistics Pooled Results (Rhythm) The results of a one-way repeated-measures ANOVA are presented in Table 97. The differences in rhythm scores from Grades 3–5 were not statistically significant, F(2, 130) = 1.947, p > .05; this conclusion was supported by multivariate test results displayed in Table 98: Pillai’s Trace = .057, F(2, 64) = 1.940, p > .05, ηp2 = .057. It was concluded the relative stability STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 184 Table 96 Group A: Fall 2016/Fall 2017/Fall 2018 Mauchly’s Test of Sphericitya Results (Rhythm) Table 97 Group A: Fall 2016/Fall 2017/Fall 2018 Tests of Within-Subjects Effects Results (Rhythm) of IMMA rhythm scores from Grade 3 to Grade 5 implied students previously had attained the stabilized rhythm aptitude stage. Thus, a period of transition was not supported for rhythm aptitude for this sample. This finding contrasted with that for tonal aptitude for Group A, in STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 185 which significant score fluctuation seemed to indicate the stabilized music aptitude stage had not yet been reached. It was conceivable tonal aptitude and rhythm aptitude operated separately in the transition between the developmental and stabilized aptitude stages. Table 98 Group A: Fall 2016/Fall 2017/Fall 2018 Multivariate Testa Results (Rhythm) Composite. Mean Fall composite scores of students in Grades 3–5 from 2016–2018 are presented in Table 99. From the results of Mauchly’s Test of Sphericity, presented as Table 100, it was concluded variances of differences between levels of composite test results for Fall test administrations in Grades 3–5 were significantly different and the condition of sphericity had not been met (X2(2) = 11.371, p < .05). The violation of sphericity resulted in an inaccurate F-test (Field, 2009, p. 476). However, application of the Greenhouse-Geisser correction yielded an adjusted F-value and degrees of freedom (F(1.69, 94.37) = 6.487, p = .002); the effect size was moderate (2 = .087) (Hatcher, 2013, p. 370). Results of the Huynh-Feldt correction were similar (F(1.732, 96.98) = 6.487, p = .002. These statistically significant results are displayed in Table STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 186 101. An examination of the statistically significant multivariate test results, presented as Table 102, confirmed this finding: Pillai’s Trace = .139, F(2, 55) = 4.449, p = .016, ηp2 = .139. Table 99 Group A: Fall 2016/Fall 2017/Fall 2018 Descriptive Statistics Pooled Results (Composite) Table 100 Group A: Fall 2016/Fall 2017/Fall 2018 Mauchly’s Test of Sphericitya Results (Composite) Significant mean differences in composite scores favoring the older grade level were suggested from results of a one-way repeated measures ANOVA. The results of the Bonferroni post hoc test, displayed as pairwise comparisons in Table 103, were interpreted as follows: on average, Grade 4 composite scores were 2.887 points higher than Grade 3 composite scores (p = .007, 95% CI [-5.074, -.701]) and Grade 5 composite scores 3.009 points higher than Grade 3 STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 187 composite scores (p = .003, 95% CI [-5.074, .944]). However, there was no significant difference in mean scores of Grades 4 and 5. It was concluded IMMA composite scores continued to be influenced by musical environment through Grade 5/age 10, thus supporting the contention students remained in the developmental music aptitude stage and had not yet achieved the Table 101 Group A: Fall 2016/Fall 2017/Fall 2018 Tests of Within-Subjects Effects Results (Composite) Table 102 Group A: Fall 2016/Fall 2017/Fall 2018 Multivariate Testa Results (Composite) STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 188 Table 103 Group A: Fall 2016/Fall 2017/Fall 2018 Pairwise Comparisons Pooled Results (Composite) stabilized music aptitude stage. This conclusion disputed Gordon’s findings that music aptitude stabilized at age 9. The mean score difference was moderate, an indication score fluctuation had not abated. Gordon (2002) had suggested a score increase averaging 2 points could be expected with use of traditional instruction; score growth in the current study exceeded this average. The lack of significant mean composite score difference between Grades 4 and 5 could be explained as the natural waning of influence of musical environment with the increase of chronological age (Gordon, 1981), instruction ill-suited to students’ current level of audiation, or a period of transition between the developmental and stabilized composite aptitude stages. Group B: Fall 2017 (Grade 3), Fall 2018 (Grade 4), Fall 2019 (Grade 5) Tonal. Mean tonal scores of students from Grades 3–5 are displayed in Table 104. From the results of Mauchly’s Test of Sphericity, presented as Table 105, it was concluded variances of STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 189 differences between levels of tonal test results for Fall test administrations in Grades 3–5 were not significantly different and the condition of sphericity had been met (X2(2) = 5.209, p > .05). Table 104 Group B: Fall 2017/Fall 2018/Fall 2019 Descriptive Statistics Pooled Results (Tonal) Table 105 Group B: Fall 2017/Fall 2018/Fall 2019 Mauchly’s Test of Sphericitya Results (Tonal) The difference in tonal scores from Grades 3–5, displayed as Table 106, was statistically significant, F(2, 102) = 3.599, p < .031; the effect size was small (2 = .047) (Hatcher, 2013, p. 370). This conclusion was disputed by the multivariate test result displayed in Table 107, which were not statistically significant: Pillai’s Trace = .109, F(2, 50) = 3.045, p > .05, ηp2 = .109. A STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 190 Bonferroni post hoc test was conducted; results are exhibited as pairwise comparisons in Table 108. No set of tonal score differences was found significantly different from another (p > .05). Table 106 Group B: Fall 2017/Fall 2018/Fall 2019 Tests of Within-Subjects Effects Results (Tonal) Table 107 Group B: Fall 2017/Fall 2018/Fall 2019 Multivariate Testa Results (Tonal) STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 191 A statistically significant mean score difference was estimated for tonal scores, as indicated from the omnibus results of a repeated measures ANOVA. However, no pair of tonal scores was found significantly different from another (p > .05) in post hoc tests using the Bonferroni correction. Therefore, it was concluded IMMA tonal scores remained relatively stable from Grade 3 to Grade 5 in this sample and a period of transition was unsubstantiated. Nevertheless, the effect of formal instruction lacking the informal guidance component necessary to support students’ preparatory audiation needs must be considered as a possible influence on these findings. Rhythm. Mean rhythm scores for students from Grades 3–5 are displayed in Table 109. From the results of Mauchly’s Test of Sphericity, presented as Table 110, it was concluded variances of differences between levels of rhythm test results for Fall test administrations in Grades 3–5 were not significantly different and the condition of sphericity had been met (X2(2) = 2.089, p > .05). Table 108 Group B: Fall 2017/Fall 2018/Fall 2019 Pairwise Comparison Pooled Results (Tonal) STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 192 Table 109 Group B: Fall 2017/Fall 2018/Fall 2019 Descriptive Statistics Pooled Results (Rhythm) Table 110 Group B: Fall 2017/Fall 2018/Fall 2019 Mauchly’s Test of Sphericitya Results (Rhythm) A one-way repeated-measures ANOVA was conducted using Grade 3, Grade 4, and Grade 5 scores from rhythm subtests. The results of the ANOVA are presented in Table 111. The difference in rhythm scores from Grades 3–5 was statistically significant, F(2, 86) = 4.201, p < .018, and the effect size moderate (2 = .067) (Hatcher, 2013, p. 370). This conclusion was supported by the multivariate test results displayed in Table 112, which were statistically significant: Pillai’s Trace = .141, F(2, 42) = 3.439, p = .041, ηp2 = .141. Both Grade 3 (1.971 points, p = .014) and Grade 5 (2.046 points, p = .006) mean rhythm scores were significantly STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION higher than Grade 4 rhythm scores, as indicated by the results of the Bonferroni post hoc test, displayed as pairwise comparisons in Table 113. Table 111 Group B: Fall 2017/Fall 2018/Fall 2019 Tests of Within-Subjects Effects Results (Rhythm) Table 112 Group B: Fall 2017/Fall 2018/Fall 2019 Multivariate Testa Results (Rhythm) 193 STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 194 Mean rhythm scores differed significantly between grade levels, as indicated by omnibus results of a repeated measures ANOVA. Results of a Bonferroni post hoc test revealed a significant difference between Grade 4 rhythm scores and scores of adjacent grade levels. However, scores did not fluctuate in a consistent direction, as the mean difference decreased from Grade 3 to Grade 4 but increased from Grade 4 to Grade 5. In addition, the mean difference between rhythm scores of Grades 3 and 5 was not significant. The discrepancies of rhythm score fluctuation were not consistent with previous patterns describing rhythm aptitude in the developmental stage, defined by continual score fluctuation, or the stabilized stage, characterized by the relative constancy of scores, as referenced in extant literature (Gordon, 1998, p. 10). Table 113 Group B: Fall 2017/Fall 2018/Fall 2019 Pairwise Comparison Pooled Results (Rhythm) Instead, Gordon (1986a, 2002) outlined decreasing gain scores and diminished influence of instruction as indicators of a transition period, which did not characterize adequately the findings of the current study. Perhaps a more expansive definition of a period of transition, during which STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 195 identified traits of developmental and stabilized rhythm aptitude vacillate and waver from their established patterns as music aptitude shifts between stages, could be used to describe this previously unaddressed pattern of rhythm score fluctuation. This finding was in contrast to that of tonal scores for Group B, in which score fluctuation was relatively static. It was possible tonal aptitude and rhythm aptitude functioned independently as the transition to the stabilized music aptitude stage occurred. Composite. Mean composite scores for students from Grades 3–5 (N = 65) are displayed in Table 114. From the results of Mauchly’s Test of Sphericity, presented as Table 115, it was concluded variances of differences between levels of composite test results for Fall test administrations in Grades 3–5 were not significantly different and the condition of sphericity had been met (X2(2) =1.509, p > .05). Table 114 Group B: Fall 2017/Fall 2018/Fall 2019 Descriptive Statistics Pooled Results (Composite) Therefore, a one-way repeated measures ANOVA was conducted to examine the longitudinal change in composite scores. The results are presented in Table 116. The mean difference in composite scores of students in Grades 3–5 was found significantly different, F(2, 80) = 4.913, p = .010. The effect size, calculated as omega squared (2 = .086), was estimated as moderate (Hatcher, 2013, p. 370). This omnibus conclusion was supported by the set of STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 196 multivariate test results displayed in Table 117, which were also statistically significant: Pillai’s Trace = .187, F(2, 39) = 4.496, p = .017, ηp2 = .187. Table 115 Group B: Fall 2017/Fall 2018/Fall 2019 Mauchly’s Test of Sphericitya Results (Composite) Table 116 Group B: Fall 2017/Fall 2018/Fall 2019 Tests of Within-Subjects Effects Results (Composite) STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 197 The results of the Bonferroni post hoc test, displayed as pairwise comparisons in Table 118, were interpreted as follows: on average, Grade 3 composite scores were 2.358 points higher than Grade 4 composite scores (p = .024, 95% CI [.255, 4.462]) and Grade 5 composite scores 3.149 points higher than Grade 4 composite scores (p = .002, 95% CI [1.067, 5.232]). However, no significant difference was found for Grade 3 and Grade 5 tonal scores (p > .05). Table 117 Group B: Fall 2017/Fall 2018/Fall 2019 Multivariate Testa Results (Composite) Composite score results mirrored those of rhythm scores: mean composite scores differed significantly between Grades 3 and 4 and Grades 4 and 5. However, the pattern of score fluctuation throughout this 3-year period was unusual: composite scores decreased from Grade 3 to Grade 4, but increased from Grade 4 to Grade 5. No significant difference was found for Grade 3 and Grade 5 composite scores. The developmental music aptitude stage had been characterized as continually fluctuating (Moore, 1990), which differed from these results. The stabilized music aptitude stage, defined by the relative constancy of scores (Gordon, 2005), also differed from these results. Perhaps a broad period of transition might describe this distinctive pattern of composite score fluctuation. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 198 Table 118 Group B: Fall 2017/Fall 2018/Fall 2019 Pairwise Comparison Pooled Results (Composite) Group C: Spring 2017 (Grade 3), Fall 2017 (Grade 4), Fall 2018 (Grade 5) Tonal. Mean tonal scores for students from Grades 3–5 are displayed in Table 119. From the results of Mauchly’s Test of Sphericity, presented as Table 120, it was concluded variances of differences between levels of tonal test results were significantly different and the condition of sphericity had not been met (X2(2) = 13.702, p = .001). The violation of sphericity resulted in an inaccurate F-test (Field, 2009, p. 476). However, application of the Greenhouse-Geisser correction yielded an adjusted F-value and degrees of freedom (F(1.67, 107.07) = 6.518, p = .004 and was statistically significant; the effect size was moderate (2 = .078) (Hatcher, 2013, p. 370). Results of the Huynh-Feldt correction were similar (F(1.71, 109.61) = 6.518, p = .003. These results are displayed in Table 121. This finding was supported by significant multivariate test results, presented as Table 122: Pillai’s Trace = .146, F(2, 63) = 5.384, p = .007, ηp2 = .146. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 199 Table 119 Group C: Spring 2017/Fall 2017/Fall 2018 Descriptive Statistics Pooled Results (Tonal) Table 120 Group C: Spring 2017/Fall 2017/Fall 2018 Mauchly’s Test of Sphericitya Results (Tonal) Bonferroni post hoc test results are displayed as pairwise comparisons in Table 123: Grade 5 tonal scores averaged 1.623 points higher than Grade 3 scores (p = .011, 95% CI [.381, 2.865]). No significant difference was found between Grade 3 and 4 or Grade 4 and 5 (p > .05). From the results of a repeated measures ANOVA with the Greenhouse-Geisser and Huynh-Feldt corrections, it was concluded a period of transition was possible between Grades 3, 4, and 5. An approach to the stabilized music aptitude stage was suggested by tapering of score fluctuation of adjacent grades (Gordon, 1986a). However, mean tonal scores of Grade 3 and STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 200 Grade 5 were significantly different, which supported the contention students had not yet fully achieved the stabilized tonal aptitude stage, as scores continued to increase due to influence of the musical environment. A transition period between the developmental music aptitude stage, marked by score fluctuation, and the stabilized music aptitude stage, marked by relative score Table 121 Group C: Spring 2017/Fall 2017/Fall 2018 Tests of Within-Subjects Effects Results (Tonal) Table 122 Group C: Spring 2017/Fall 2017/Fall 2018 Multivariate Testa Results (Tonal) STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 201 Table 123 Group C: Spring 2017/Fall 2017/Fall 2018 Pairwise Comparison Pooled Results (Tonal) constancy, might explain the lack of significant difference in scores of adjacent grades as well as significant score difference in scores of Grades 3 and 5. The proposed transition period was characterized loosely by Gordon’s (1986a) description of decreasing gain scores and diminishing influence of instruction as indicators of a transition period between stages of music aptitude. Rhythm. Mean rhythm scores for students from Grades 3–5 are displayed in Table 124. From the results of Mauchly’s Test of Sphericity, exhibited as Table 125 it was concluded variances of differences between levels of rhythm test results were not significantly different and the condition of sphericity had been met (X2(2) = .926, p > .05). A one-way repeated-measures ANOVA was conducted using Grade 3, Grade 4, and Grade 5 rhythm scores. The results of the rhythm ANOVA are presented in Table 126; the difference in rhythm scores from Grades 3–5 was not statistically significant, F(2, 126) = .786, p STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 202 > .05. This conclusion was supported by the multivariate test results displayed in Table 127, which were not statistically significant: Pillai’s Trace = .022, F(2, 62) = .697, p > .05, ηp2 = .022. The lack of equivalent levels of significant fluctuation of tonal scores and rhythm scores might indicate stabilization of rhythm aptitude independent from that of tonal aptitude. It was interpreted that Group C’s rhythm aptitude might have stabilized prior to Grade 3, as mean rhythm score differences were non-significant. On the other hand, it appeared tonal aptitude of Group C remained developmental, as tonal scores continued to fluctuate significantly from Grade 3 through Grade 5. Table 124 Group C: Spring 2017/Fall 2017/Fall 2018 Descriptive Statistics Pooled Results (Rhythm) Table 125 Group C: Spring 2017/Fall 2017/Fall 2018 Mauchly’s Test of Sphericitya Results (Rhythm) STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION Table 126 Group C: Spring 2017/Fall 2017/Fall 2018 Tests of Within-Subjects Effects Results (Rhythm) Table 127 Group C: Spring 2017/Fall 2017/Fall 2018 Multivariate Testa Results (Rhythm) Mean rhythm scores did not differ significantly between grade levels, as indicated by results of a repeated measures ANOVA. This conclusion was supported by Pillai’s Trace test results (p > .05). IMMA rhythm scores were relatively stable from Grade 3 through Grade 5; 203 STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 204 thus, it was conjectured students were beyond the developmental rhythm aptitude stage, noted for continual score fluctuation (Gordon, 2012, p. 47). Although Gordon (1981) acknowledged a decrease in the effect of environment as children approached age 8, attainment of the stabilized music aptitude stage prior to Grade 3 would have been contrary to the findings of some extant literature (Gordon, 2013, p. 14), but would confirm the findings of researchers such as DeYarman (1972, 1975), Harrington (1969), and Schleuter and DeYarman (1975). Nevertheless, without an examination of scores preceding Grade 3, it was not possible to ascertain when score fluctuation tapered and the shift to the stabilized rhythm aptitude stage had begun in the current study. Therefore, future studies with an expanded range of grade levels, including those preceding and succeeding those of the current sample, are recommended to further clarify when the effect of the musical environment recedes and the stabilized music aptitude stage is attained. Composite. Mean composite scores for students from Grades 3–5 are displayed in Table 128. From the results of Mauchly’s Test of Sphericity, presented as Table 129, it was concluded variances of differences between levels of composite test results were significantly different and the condition of sphericity had been violated (X2(2) = 28.841, p < .001). The violation of sphericity resulted in an inaccurate F-test (Field, 2009, p. 476). However, application of the GreenhouseGeisser correction yielded a statistically significant adjusted F-value and degrees of freedom (F(1.42, 77.80) = 6.636, p = .006; the effect size was moderate (2= .090) (Hatcher, 2013, p. 370). Results of the Huynh-Feldt correction were similar (F(1.44, 79.26) = 6.636, p = .005. ANOVA results are displayed in Table 130 and supported by statistically significant multivariate test results, presented as Table 131: Pillai’s Trace = .128, F(2, 54) = 3.977, p = .024, ηp2 = .128. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 205 The results of the Bonferroni post hoc test are exhibited as pairwise comparisons in Table 132: on average, Grade 4 composite scores were 2.586 points higher than Grade 3 composite scores (p = .05, 95% CI [.024, 5.148]) and Grade 5 composite scores were 2.734 points higher than Grade 3 composite scores (p = .015, 95% CI [.461, 5.008]). However, the difference between Grade 4 and Grade 5 composite scores was not significant (p > .05). Table 128 Group C: Spring 2017/Fall 2017/Fall 2018 Descriptive Statistics Pooled Results (Composite) Table 129 Group C: Spring 2017/Fall 2017/Fall 2018 Mauchly’s Test of Sphericitya Results (Composite) Mean composite scores differed significantly between Grades 3 and 4 and Grades 3 and 5, as indicated from results of a repeated measures ANOVA; no significant difference was found STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 206 between Grade 4 and Grade 5 scores. The significant differences in scores were indicative of continued composite score fluctuation consistent with the developmental music aptitude stage. Table 130 Group C: Spring 2017/Fall 2017/Fall 2018 Tests of Within-Subjects Effects Results (Composite) Table 131 Group C: Spring 2017/Fall 2017/Fall 2018 Multivariate Testa Results (Composite) STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 207 However, the score difference (.148 points) between Grade 4 and Grade 5 scores could be explained as the tapering of score difference described by Gordon (1986a). The composite score difference was non-significant, which was suggestive of the constancy that characterized the stabilized music aptitude stage (Gordon, 1998, p. 10). A period of transition in which score activity is inconsistent and fluctuation intermittent might account for the discrepancy of these findings. Nevertheless, separate consideration of tonal and rhythm aptitude when examining the shift from developmental to stabilized music aptitude was theorized from the findings of Group C, as it appeared rhythm aptitude stabilized prior to tonal aptitude for this Group. Table 132 Group C: Spring 2017/Fall 2017/Fall 2018 Pairwise Comparison Pooled Results (Composite) Group D: Spring 2017 (Grade 3), Fall 2017 (Grade 4), Spring 2019 (Grade 5) Tonal. Mean tonal scores for students from Grades 3–5 are depicted in Table 133. From the results of Mauchly’s Test of Sphericity, presented as Table 134, it was concluded variances of STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 208 differences between levels of tonal test results were not significantly different and the condition of sphericity had been met (X2(2) = 3.593, p > .05). Table 133 Group D: Spring 2017/Fall 2017/Spring 2019 Descriptive Statistics Pooled Results (Tonal) Table 134 Group D: Spring 2017/Fall 2017/Spring 2019 Mauchly’s Test of Sphericity a Results (Tonal) The results of a one-way repeated measures ANOVA are presented in Table 135: the difference in tonal scores from Grades 3–5 was statistically significant, F(2, 118) = 7.306, p = .001 and the effect size moderate (2 = .094) (Hatcher, 2013, p. 370). This conclusion was supported by the statistically significant multivariate test results displayed in Table 136: Pillai’s Trace = .174, F(2, 58) = 6.121, p = .004, ηp2 = .174. The results of the Bonferroni post hoc test STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 209 are featured as pairwise comparisons in Table 137: on average, Grade 5 tonal scores were 2.067 points higher than Grade 3 tonal scores (p < .001, 95% CI [.832, 3.301]). The mean composite score differences for Grades 3–4 and Grades 4–5 were not significant (p > .05). Table 135 Group D: Spring 2017/Fall 2017/Spring 2019 Tests of Within-Subjects Effects Results (Tonal) Table 136 Group D: Spring 2017/Fall 2017/Spring 2019 Multivariate Test a Results (Tonal) STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 210 Table 137 Group D: Spring 2017/Fall 2017/Spring 2019 Pairwise Comparison Pooled Results (Tonal) The findings for Spring 2017/Fall 2017/Spring 2019 tonal scores were mixed. Significant mean tonal score differences were found for Grades 3 and 5, yet the mean tonal score difference for Grades 3 and 4 and Grades 4 and 5 were not statistically significant. This finding was similar to the Group C tonal findings, from which a period of transition was speculated. Although the developmental tonal aptitude stage had been described as a period in which tonal aptitude constantly fluctuates (Gordon, 2001a), the level of score fluctuation in the current study was inconsistent. A broad period of transition was theorized to account for this discrepancy in tonal score fluctuation, as conflicting traits of both stages of music aptitude were exhibited simultaneously within an extended time span. Rhythm. Mean rhythm scores for students from Grades 3–5 are displayed in Table 138. From the results of Mauchly’s Test of Sphericity, presented as Table 139, it was concluded variances of STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 211 differences between levels of rhythm test results were not significantly different and the condition of sphericity had been met (X2(2) = .459, p > .05). Table 138 Group D: Spring 2017/Fall 2017/Spring 2019 Descriptive Statistics Pooled Results (Rhythm) Table 139 Group D: Spring 2017/Fall 2017/Spring 2019 Mauchly’s Test of Sphericitya Results (Rhythm) A one-way repeated-measures ANOVA was conducted using Grade 3, Grade 4, and Grade 5 rhythm scores. The results of the rhythm ANOVA may be seen in Table 140: the difference in rhythm scores from Grades 3–5 was not statistically significant, F(2, 126) = 1.471, p > .05. This conclusion was supported by the multivariate test results exhibited in Table 141: Pillai’s Trace = .046, F(2, 62) = 1.493, p > .05, ηp2 = .046. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 212 Table 140 Group D: Spring 2017/Fall 2017/Spring 2019 Tests of Within-Subjects Effects Results (Rhythm) Table 141 Group D: Spring 2017/Fall 2017/Spring 2019 Multivariate Testa Results (Rhythm) Mean rhythm scores did not differ significantly between grade levels, as indicated by results of a repeated measures ANOVA. This conclusion was supported by Pillai’s Trace test results (p > .05). Therefore, it was concluded IMMA rhythm scores were relatively stable from STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 213 Grade 3 through Grade 5, in opposition to the defined parameters of developmental and stabilized music aptitude reported in extant literature (Gordon, 2006). A period of rhythm transition was unsubstantiated: it appeared rhythm aptitude had stabilized prior to Grade 3, unlike tonal aptitude, which continued to exhibit significant score fluctuation as indicative of the developmental rhythm aptitude stage. An atomistic perspective of developmental music aptitude, as conjectured by Gordon (1998, p. 71), would suggest separate consideration of tonal aptitude and rhythm aptitude when considering a period of transition. Composite. Mean composite scores for students from Grades 3–5 are displayed in Table 142. From the results of Mauchly’s Test of Sphericity, presented as Table 143, it was concluded variances of differences between levels of composite test results were significantly different and the condition of sphericity had not been met (X2(2) = 7.654, p = .022). The violation of sphericity resulted in an inaccurate F-test (Field, 2009, p. 476). However, application of the GreenhouseGeisser correction yielded an adjusted F-value and degrees of freedom (F(1.76, 91.28) = 7.607, p = .001; the effect size was moderate (2 = .110) (Hatcher, 2013, p. 370). Huynh-Feldt correction results were similar (F(1.81, 94.22) = 7.607, p = .001. These statistically significant omnibus results are displayed in Table 144. An examination of the statistically significant multivariate test results, presented as Table 145, confirmed this finding: Pillai’s Trace = .181, F(2, 52) = 5.617, p = .006, ηp2 = .181. The results of the Bonferroni post hoc test are displayed as pairwise comparisons in Table 146: on average, Grade 4 composite scores were 2.901 points higher than Grade 3 composite scores (p = .033, 95% CI [.307, 5.495]) and Grade 5 composite scores were 4.047 points higher STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 214 than Grade 3 composite scores (p < .001, 95% CI [1.645, 6.449]). However, the difference between Grade 4 and Grade 5 composite scores was not significant (p > .05). Table 142 Group D: Spring 2017/Fall 2017/Spring 2019 Descriptive Statistics Pooled Results (Composite) Table 143 Group D: Spring 2017/Fall 2017/Spring 2019 Mauchly’s Test of Sphericitya Results (Composite) Significant differences in composite scores of Grades 3 and 4 and of Grades 3 and 5 composite scores were suggested from results of a one-way repeated measures ANOVA. Thus, continued score fluctuation was implied, suggesting students had not yet achieved the stabilized music aptitude stage and remained in the developmental music aptitude stage. However, the difference in Grade 4 and Grade 5 composite scores was not statistically significant (p > .05), STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 215 which implied stability of composite scores during that time period. The developmental and stabilized music aptitude stages previously described did not account for this discrepancy in score fluctuation. Thus, a period of transition between the two stages of music aptitude was speculated to explain the variation in predicted score fluctuation described in previous research (Gordon, 2006). Table 144 Group D: Spring 2017/Fall 2017/Spring 2019 Tests of Within-Subjects Effects (Composite) Group E: Spring 2018 (Grade 3), Spring 2019 (Grade 4), Fall 2019 (Grade 5) Tonal. Mean tonal scores for students from Grades 3–5 are displayed in Table 147. From the results of Mauchly’s Test of Sphericity, presented as Table 148, it was concluded variances of differences between levels of tonal test results were significant and the condition of sphericity had not been met (X2(2) = 8.020, p = .018). The violation of sphericity resulted in an inaccurate F-test (Field, 2009, p. 476). However, application of the Greenhouse-Geisser correction yielded an adjusted F-value and degrees of freedom (F(1.70, 73.26) = .272, p > .05; results of the STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 216 Huynh-Feldt correction were similar (F(1.77, 75.98) = .272, p > .05). These non-significant results are displayed in Table 149. Multivariate test results (p > .05), presented as Table 150, confirmed this finding: Pillai’s Trace = .017, F(2, 42) = .368, p > .05, ηp2 = .017. Table 145 Group D: Spring 2017/Fall 2017/Spring 2019 Multivariate Testa Results (Composite) Mean tonal scores did not differ significantly between grade levels, as indicated by results of a repeated measures ANOVA. The relative stability concluded from the lack of significant tonal score fluctuation was suggestive of the achievement of the stabilized music aptitude stage prior to Grade 3. However, this could not be confirmed without an examination of scores of younger grade levels. The conclusion that the stabilized music aptitude stage had been attained prior to Grade 3 was contrary to that described by Gordon (2013, pp. 11–12), but was supported by the findings of other researchers (DeYarman, 1972; Harrington, 1969; Stevens, 1987; Schleuter & DeYarman, 1977) who asserted music aptitude stabilized as early as age 6. Because tonal aptitude appeared to have stabilized before the grade levels in question, a period of transition was unsubstantiated. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 217 Table 146 Group D: Spring 2017/Fall 2017/Spring 2019 Pairwise Comparison Pooled Results (Composite) Table 147 Group E: Spring 2018/Spring 2019/Fall 2019 Descriptive Statistics Pooled Results (Tonal) Rhythm. Mean rhythm scores for students from Grades 3–5 are displayed in Table 151. From the results of Mauchly’s Test of Sphericity, presented as Table 152, it was concluded variances of differences between levels of rhythm test results were not significant and the condition of sphericity had been met (X2(2) = 3.274, p > .05). STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 218 Table 148 Group E: Spring 2018/Spring 2019/Fall 2019 Mauchly’s Test of Sphericitya Results (Tonal) Table 149 Group E: Spring 2018/Spring 2019/Fall 2019 Tests of Within-Subjects Effects Results (Tonal) A one-way repeated-measures ANOVA was conducted using Grade 3, Grade 4, and Grade 5 composite scores. The results of the rhythm ANOVA are presented in Table 153: the difference in rhythm scores from Grades 3–5 was statistically significant, F(2, 96) = 3.461, p = STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 219 .035 and the effect size small (2 = .047) (Hatcher, 2013, p. 370). This conclusion was supported by the multivariate test results displayed in Table 154, which were statistically significant: Pillai’s Trace = .156, F(2, 47) = 4.341, p = .019, ηp2 = .156. Table 150 Group E: Spring 2018/Spring 2019/Fall 2019 Multivariate Testa Results (Tonal) Table 151 Group E: Spring 2018/Spring 2019/Fall 2019 Descriptive Statistics Pooled Results (Rhythm) The results of the Bonferroni post hoc test are featured as pairwise comparisons in Table 155. On average, Grade 5 rhythm scores were approximately 1.4 points higher than Grade 3 rhythm scores (p = .027, 95% CI [.195, 2.602]). However, the mean difference between rhythm scores for Grades 3 and 4 and Grades 4 and 5 was not significant (p > .05). For Group E, the STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 220 conjectured proposal in favor of atomistic consideration of tonal aptitude and rhythm aptitude also seemed to apply; however, tonal aptitude seemed to have stabilized before rhythm aptitude for this Group. Table 152 Group E: Spring 2018/Spring 2019/Fall 2019 Mauchly’s Test of Sphericitya Results (Rhythm) Table 153 Group E: Spring 2018/Spring 2019/Fall 2019 Tests of Within-Subjects Effects Results (Rhythm) STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 221 Table 154 Group E: Spring 2018/Spring 2019/Fall 2019 Multivariate Testa Results (Rhythm) Table 155 Group E: Spring 2018/Spring 2019/Fall 2019 Pairwise Comparison Pooled Results (Rhythm) A significant difference in mean rhythm scores of Grade 3 and Grade 5 was concluded from the results of a repeated measures ANOVA. However, no significant rhythm score STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 222 difference was estimated for adjacent grade levels. Thus, IMMA rhythm scores were significantly different over the 3-year period in question, but not from Grade 3 to Grade 4 or Grade 4 to Grade 5. For influence of the musical environment to be significant overall, but not for adjacent grade levels was puzzling. These results did not align with the findings of previous studies in which score fluctuation was continual throughout the developmental music aptitude stage and gain scores decreased as students approached age 9 (Gordon, 1986a), but might be explained by a period of transition begun in Grade 3, after which rhythm score fluctuation was minimal and non-significant until Grade 5. Composite. Mean composite scores for students from Grades 3–5 are depicted in Table 156. From the results of Mauchly’s Test of Sphericity, displayed as Table 157, it was concluded variances of differences between levels of composite test results were significantly different and the condition of sphericity had not been met (X2(2) = 6.534, p = .038). The violation of sphericity resulted in an inaccurate F-test (Field, 2009, p. 476). However, application of the Greenhouse-Geisser correction yielded an adjusted F-value and degrees of freedom (F(1.72, 63.47) = 3.163, p > .05); results of the Huynh-Feldt correction were similar (F(56.91, 66.25) = 3.163, p > .05. These nonsignificant results are displayed in Table 158. This finding was disputed by statistically significant multivariate test results, presented as Table 159: Pillai’s Trace = .221, F(2, 36) = 5.119, p = .011, ηp2 = .221. The results of the Bonferroni post hoc test are displayed as pairwise comparisons in Table 160: Grade 5 composite scores were an average of 1.951 points higher than Grade 3 composite scores (p = .018). However, the difference between composite scores for Grades 3 and 4 and Grades 4 and 5 was not significant (p > .05). STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 223 Table 156 Group E: Spring 2018/Spring 2019/Fall 2019 Descriptive Statistics Pooled Results (Composite) Table 157 Group E: Spring 2018/Spring 2019/Fall 2019 Mauchly’s Test of Sphericitya Results (Composite) Mean composite scores did not differ significantly between grade levels, as indicated by results of a repeated measures ANOVA with the Greenhouse-Geisser correction or Huynh-Feldt correction. This conclusion was not supported by Pillai’s Trace test results, which were statistically significant. A post hoc test using the Bonferroni correction was conducted and yielded a significant difference in Grade 3 and Grade 5 rhythm scores, but not for adjacent grade levels. Thus, it was concluded IMMA composite scores were influenced minimally by musical environment between adjacent grade levels and significantly over the 3-year period in STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 224 question. These results differed from those described in extant research, in which the developmental music aptitude stage was characterized as ever changing in response to students’ interaction with the musical environment (Gordon, 2001b, p. 82), yet could be accounted for by a broad period of transition begun in Grade 3, after which composite score fluctuation was minimal and non-significant until Grade 5. Table 158 Group E: Spring 2018/Spring 2019/Fall 2019 Tests of Within-Subjects Effects (Composite) Group F: Fall 2017 (Grade 3), Spring 2019 (Grade 4), Fall 2019 (Grade 5) Tonal. Mean tonal scores for students from Grades 3–5 are presented in Table 161. From the results of Mauchly’s Test of Sphericity, exhibited as Table 162, it was concluded variances of differences between levels of tonal test results were significantly different and the condition of sphericity had not been met (X2(2) = 6.158, p = .046). The violation of sphericity resulted in an inaccurate F-test (Field, 2009, p. 476). However, application of the Greenhouse-Geisser correction yielded an adjusted F-value and degrees of freedom (F(1.78, 83.53) = 1.376, p > .05); STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 225 the Huynh-Feldt correction yielded similar results (F(1.84, 86.59) = 1.376, p > .05. The omnibus ANOVA result was not statistically significant, as displayed in Table 163. This finding was supported by multivariate test results, presented as Table 164: Pillai’s Trace = .065, F(2, 46) = 1.599, p > .05, ηp2 = .065. Table 159 Group E: Spring 2018/Spring 2019/Fall 2019 Multivariate Testa Results (Composite) From the results of a one-way repeated measures ANOVA, it was estimated mean tonal scores did not differ significantly between grade levels. This conclusion was supported by nonsignificant multivariate test results (p > .05). It was suggested by the relative constancy of tonal scores that students had achieved the stabilized tonal aptitude stage before Grade 3, a conclusion which differed from the findings of previous research (Gordon, 2013, p. 15) and supported the findings of DeYarman (1972, 1975), Harrington (1969), and Schleuter and DeYarman (1977). A period of transition was not substantiated by these findings. However, further research is recommended to examine the effect of instruction with and without informal guidance, as the lack of informal guidance activities necessary to move students beyond tonal and rhythm music STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 226 babble might have affected students’ ability to benefit from formal instruction, which in turn might have inhibited accurate measurement and interpretation of IMMA scores as applied to a period of transition. Table 160 Group E: Spring 2018/Spring 2019/Fall 2019 Pairwise Comparison Pooled Results (Composite) Table 161 Group F: Fall 2017/Spring 2019/Fall 2019 Descriptive Statistics Pooled Results (Tonal) Rhythm. Mean rhythm scores for 65 students from Grades 3–5 are displayed in Table 165. From the results of Mauchly’s Test of Sphericity, presented as Table 166, it was concluded variances STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION of differences between levels of rhythm test results were not significantly different and the condition of sphericity had been met (X2(2) = 1.066, p > .05). Table 162 Group F: Fall 2017/Spring 2019/Fall 2019 Mauchly’s Test of Sphericitya Results (Tonal) Table 163 Group F: Fall 2017/Spring 2019/Fall 2019 Tests of Within-Subjects Effects Results (Tonal) 227 STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 228 The results of a repeated-measures ANOVA are presented in Table 167. The difference in rhythm scores from Grades 3–5 was not statistically significant, F(2, 94) = .149, p > .05. This conclusion was supported by non-significant multivariate test results, displayed in Table 168: Pillai’s Trace = .006, F(2, 46) = .133, p > .05, ηp2 = .006. Table 164 Group F: Fall 2017/Spring 2019/Fall 2019 Multivariate Testa Results (Tonal) Table 165 Group F: Fall 2017/Spring 2019/Fall 2019 Descriptive Statistics Pooled Results (Rhythm) From these repeated measures ANOVA results, it was concluded mean rhythm scores did not differ significantly between grade levels. This conclusion was supported by non-significant multivariate Pillai’s Trace test results. It was suggested by the relative constancy of rhythm STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 229 scores that students had achieved the stabilized rhythm aptitude stage prior to Grade 3, which aligned with findings for tonal aptitude for the same Group. Nonetheless, this contention was in opposition to findings of previous research (Gordon, 2013). A period of transition was not substantiated for rhythm aptitude. Table 166 Group F: Fall 2017/Spring 2019/Fall 2019 Mauchly’s Test of Sphericitya Results (Rhythm) Table 167 Group F: Fall 2017/Spring 2019/Fall 2019 Tests of Within-Subjects Effects Results (Rhythm) STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 230 Composite. Mean composite scores for 65 students from Grades 3–5 may be seen in Table 169. From the results of Mauchly’s Test of Sphericity, presented as Table 170, it was concluded variances of differences between levels of composite test results were not significantly different and the condition of sphericity had been met (X2(2) = 2.114, p > .05). Table 168 Group F: Fall 2017/Spring 2019/Fall 2019 Multivariate Testa Results (Rhythm) Table 169 Group F: Fall 2017/Spring 2019/Fall 2019 Descriptive Statistics Pooled Results (Composite) A one-way repeated-measures ANOVA was conducted using Grade 3, Grade 4, and Grade 5 composite scores. From the results of the composite ANOVA, presented in Table 171, it STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 231 was concluded the difference in composite scores from Grades 3–5 was not statistically significant, F(2, 80) = .584, p > .05.This conclusion was supported by multivariate test results displayed in Table 172, which were not statistically significant: Pillai’s Trace = .036, F(2, 39) = .729, p > .05, ηp2 = .036. From these results, it was concluded mean composite scores did not differ significantly between grade levels. This conclusion was supported by Pillai’s Trace test results. It was suggested by the relative constancy of composite scores that students had achieved the stabilized music aptitude stage prior to Grade 3, which was in contrast to the findings of previous studies in which music aptitude stabilized at approximately age 9 (Gordon, 2013, pp. 11–12). A period of transition was unsubstantiated for composite music aptitude from the results of the current study. Table 170 Group F: Fall 2017/Spring 2019/Fall 2019 Mauchly’s Test of Sphericity a Results (Composite) Synopsis Results of an examination of mean tonal score differences over a 3-year period using oneway repeated measures ANOVA were mixed, as shown in Figure 12. Grade 4 tonal scores were STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 232 significantly higher than Grade 3 tonal scores for Groups A and D. Similarly, Grade 5 students outscored Grade 3 students on the IMMA tonal subtest for Group A, Group C, and Group D. Nevertheless, no significant score difference was found for Grade 4 and Grade 5 tonal scores, and no significant score differences were found for any grade level combination for Groups B, E, and F. From these mixed results, no definitive conclusion was deduced: for some grade level combinations, significant tonal score increases from Grade 3 to Grade 4 and Grade 5 were suggestive of the continued influence of instruction characteristic of the developmental music aptitude stage, yet the lack of significant score difference from Grade 4 to Grade 5 implied a lessening of environmental influence corresponding to an increase in chronological age (Gordon, 1986a). For other cases, no significant score differences could be interpreted as student attainment of the stabilized tonal aptitude stage, in which tonal aptitude no longer was susceptible to the influence of the musical environment, prior to Grade 3. This interpretation was counter to descriptions of the stages of music aptitude by some researchers (DeYarman, 1975; Gordon, 1989b; Haroutounian, 2002; Mang, 2013; Moore, 1987; Schleuter & DeYarman, 1977). Table 171 Group F: Fall 2017/Spring 2019/Fall 2019 Tests of Within-Subjects Effects Results (Composite) STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 233 Table 172 Group F: Fall 2017/Spring 2019/Fall 2019 Multivariate Testa Results (Composite) Nevertheless, other researchers concluded the onset of stabilized music aptitude occurred as early as age 6 (DeYarman, 1975; Schleuter & DeYarman, 1977). It was possible the developmental tonal aptitude stage began to evolve as score fluctuation leveled off; it was also conceivable this transformation of the developmental tonal aptitude stage resulted in a period of transition beginning in Grade 3, when significant tonal score fluctuation seemed to suggest a continued influence of musical environment, and extending through Grade 5, when tonal scores appeared to stabilize. An uneven increase in tonal scores and rhythm scores in a sample of minoritized students led Gordon (1980b) to suggest an examination of the transition from the developmental to stabilized music aptitude stages. The findings of the current study seemed to suggest tonal aptitude and rhythm aptitude might not stabilize at the same time and therefore might transition between music aptitude stages independently of each other. Nevertheless, evidence to suggest a transition period was not definitive for the current sample, and further STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 234 research is necessary to investigate the stability of tonal scores of an extended range of grade levels, particularly those immediately prior to Grade 3 and beyond Grade 5. Figure 12 Repeated Measures ANOVA Results (Tonal) Longitudinal stability of rhythm aptitude was also inconclusive: the results of a series of one-way repeated measures ANOVA, as illustrated in Figure 13, exhibited non-significant mean differences in grade level scores in the majority of cases. Minimal influence of musical environment was noted in two instances: Grade 3 and Grade 5 scores were significantly higher than Grade 4 scores (approximately 2 points) for Group B, and Grade 5 students outscored Grade STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 235 3 students by approximately 1.4 points for Group E. There was inadequate evidence from these findings to substantiate a period of transition between the developmental and stabilized music aptitude stages at age 9/Grade 4, as it was possible the lack of significant score fluctuation was due to achievement of the stabilized music aptitude stage prior to Grade 3. Gordon (1986a) suggested the IMMA rhythm subtest might be more characteristic of stabilized rhythm aptitude than of developmental rhythm aptitude, as seemed to be the case with these ANOVA findings. Figure 13 Repeated Measures ANOVA Results (Rhythm) STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 236 IMMA composite scores continued to be influenced nominally by musical environment, as concluded from one-way repeated measures ANOVA findings, featured in Figure 14. Inconsistent score fluctuation resulted in an atypical pattern of mean composite score difference in which neither continual score fluctuation, which characterized the developmental music aptitude stage, nor immutability of scores to instruction, the defining trait of the stabilized music aptitude stage (Gordon, 1971), was described. This variation from previously described tenets of developmental and stabilized music aptitude is worthy of continued study. A period of transition was speculated to account for this discrepancy in score fluctuation, as Gordon (1980b) noted from his observation of uneven growth of tonal and rhythm aptitude in minoritized students. From the findings of that study, Gordon (1980b) recommended future research considering “theories of instruction” of “culturally homogeneous” groups of students in the developmental music aptitude stage. Composite scores are the sum of tonal scores and rhythm scores. Therefore, it was reasonable to presume the findings of an examination of mean tonal score difference might influence composite score difference. It was previously concluded tonal results were mixed: significant score increase from Grade 3 to Grades 4 and 5 were suggestive of a continuation of the developmental tonal aptitude stage, but was at odds with the lack of significant tonal score difference from Grade 4 to Grade 5. A period of transition was broached as a possible explanation for this score discrepancy that had not been described in previous literature. A similar trend was noted for composite results. It was anticipated rhythm results would be reflected in composite results as well, and might even impose additional influence to that of tonal scores, as Gordon (1986c) contended students with rhythm and composite scores equivalent to or higher than criterion scores may be considered to have superior overall developmental music STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 237 aptitude to students whose tonal and composite scores were equivalent to or higher than criterion scores (p. 69): A child who has received raw scores on only the Rhythm test and the Composite test which are the same as or higher than the criterion raw scores for his grade may be considered superior in overall [developmental] music aptitude to a child who has achieved raw scores on only the Tonal test and the Composite test which are the same as or higher than the criterion raw score for his grade (Gordon, 1986c, p. 69). Figure 14 Repeated Measures ANOVA Results (Composite) STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 238 In contrast, Moore (1987) noted that despite the contribution of rhythm aptitude to developmental composite aptitude, the construct of music aptitude seemed too complex to be influenced disproportionately by one element. Nonetheless, rhythm results were not reflected as strongly in composite results as anticipated, as rhythm gain scores were stagnant or nonsignificant for several years when composite scores increased. Instead, composite results seemed to parallel tonal results more closely in the current study. The lack of significant score difference in rhythm scores implied a relative stability inadequate to substantiate a period of transition when considered in isolation, yet a period of transition might account for the unusual pattern of score fluctuation observed in composite results. Although a transition period between the developmental and stabilized music aptitude stages was theorized, it could not be substantiated conclusively at a gestalt level from these repeated measures ANOVA findings. Nevertheless, it was conjectured tonal aptitude and rhythm aptitude stabilized separately; rhythm aptitude seemed to have stabilized prior to Grade 3 for many Groups. Summary To address Research Question 1, the effect of chronological age on music aptitude was examined through paired t-tests of scores separated by summer months in which students do not attend school, thus controlling for instruction. Fall scores were significantly different than preceding Spring scores for rhythm and composite tests; however, the mean differences were small. Thus, scores remained marginally fluid throughout Grades 3, 4, and 5, with minimal score differences. From these results, it was determined an effect of chronological age was not established conclusively by the findings of this study. Wilcoxon Signed Rank tests were used to examine Grade 3 Fall and Spring tonal, rhythm, and composite scores by academic year in order to determine the effect of instruction on STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 239 music aptitude, the focus of Research Question 2. The mean difference was statistically significant for 2011–2012 and 2013–2014 tonal scores; however, the effect sizes were small. In the majority of cases, Spring tonal scores exceeded Fall tonal scores; nevertheless, the mean difference was often nominal and not statistically significant. No effect of instruction was concluded, as Grade 3 tonal, rhythm, and composite scores were relatively stable. The longitudinal trend of tonal scores provided limited evidence of a transition period between the developmental and stabilized music aptitude stages, which was central to Research Question 3. Scores of IMMA tests administered in three consecutive years (Grades 3, 4, and 5) were examined in a series of one-way repeated-measures ANOVA. From the findings, an atypical pattern was observed in which the mean tonal score differences of adjacent early grades (Grades 3 and 4) and nonadjacent grades (Grades 3 and 5) were statistically significant, yet scores of adjacent later grades (Grades 4 and 5) were relatively stable. As this trend had not been described in previous literature, a period of transition was proposed to account for the discrepancies in tonal score fluctuation. Relative stability was suggested by the lack of significant difference in rhythm scores; inadequate evidence was found to substantiate a period of transition for rhythm aptitude. Mixed composite results, similar to tonal results, were concluded from repeated measures ANOVA results: score increases from Grade 3 to Grade 4 and Grade 4 to Grade 5 were suggestive of continued influence of musical environment. Score fluctuation is the hallmark of developmental music aptitude: it has been described as continual but varying by student (Walters, 1991), with decreased volatility as children near age 9 (Levinowitz & Scheetz, 1998). The decrease in environmental effect on music aptitude that accompanied an increase in chronological age might seem to apply to the non-significant score difference from Grade 4 to Grade 5; however, the significant overall composite score increase STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 240 from Grade 3 to Grade 5 did not align with that theory. The inconsistency of composite score fluctuation, manifested as an atypical pattern not described in previous literature, led to speculation of a period of transition for composite music aptitude. Chapter 5 Discussion, Recommendations, and Conclusions Purpose of the Study Background The goal of adapting instruction to address individual learning differences is generally accepted within the field of education (Heathers, 1977). By extension, differentiated instruction in music education is advantageous in promoting student mastery. Music aptitude scores give an overview of students’ musical strengths and weaknesses; this data is critical to selection of appropriate tasks and content for each student. Previous music aptitude tests differed in content and intent (Gordon, 1987). The terms aural imagery, inner hearing, and aural perception were used as if equivalent, and terms such as ability, talent, and achievement were used interchangeably and indiscriminately (Boyle, 1992), thus confounding the construct test developers sought to measure. Some researchers speculated one is born with a certain level of music aptitude (nature); others claimed music aptitude was the result of the musical environment (nurture) (Gordon, 1998). A gestalt perspective yields a comprehensive overview of music aptitude; proponents of the atomistic view measure discrete dimensions of music aptitude (Grashel, 2008). Because of these disparities, an examination of music aptitude test scores based on a unified approach to music aptitude was needed. Gordon analyzed the effectiveness and validity of previous music aptitude measures and adapted select features for use in his own music aptitude test batteries. Consequently, the theoretical framework STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 241 of this study is based on the research of Edwin Gordon on the construct and measurement of stabilized music aptitude. Problems Gordon (2001a) conjectured two stages of music aptitude: the developmental music aptitude stage, characterized by fluctuations in response to instruction and influence of the musical environment, and stabilized music aptitude, established at approximately age 9 and noted for its resistance to instruction and the effects of the musical environment. However, DeYarman (1972, 1975), Harrington (1969), and Schleuter and DeYarman (1977) conducted research using modifications of the Musical Aptitude Profile (MAP) with primary-age children and concluded music aptitude stabilized before age 9, although there was no firm consensus among them on the age of onset (Gordon, 1979a). Gordon (1982) first designed the Primary Measures of Music Audiation (PMMA) to measure developmental music aptitude and later developed the Intermediate Measures of Music Audiation (IMMA) as a more difficult measure of developmental music aptitude for students who scored in the 80th percentile or higher on PMMA. However, Gordon (1984a) asserted IMMA also could be used to measure stabilized music aptitude in students nine years and older. The onset and manner of transition from the developmental music aptitude stage to the stabilized music aptitude stage is unclear, as extant research that specifically examined when and how music aptitude becomes stabilized was scarce (Gordon, 1980b). Gordon hypothesized the stabilized music aptitude stage had been reached when, despite score changes related to chronological age, the relative standing of students’ music aptitude levels remained constant (Gordon, 2001b). It should be noted, however, that the constancy of stabilized music aptitude has not been investigated in a sample of musically select students, defined as students who participate in school ensembles (Gordon, 1995), as was the STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 242 case with numerous students in the current sample. Although this topic was beyond the parameters of the current study, further research is recommended to investigate the effect of ensemble participation on stabilized music aptitude. Purpose of the Study Thus, the purpose of this research study was to investigate the onset of, transition to, and longitudinal constancy of stabilized music aptitude in upper elementary students. Research Questions The following questions guided the research: 1. At what grade level does chronological age cease to affect student music aptitude? 2. At what grade level does instruction cease to affect student music aptitude? 3. Is there evidence to substantiate the transition between the developmental music aptitude stage and stabilized music aptitude stage at approximately age 9/Grade 4? Methodology Sample A nonprobability convenience sample of IMMA scores (N = 1,650) was collected from intact classes of students in Grades 3, 4, and 5 who were enrolled in a small, rural, public school in central Pennsylvania where the researcher has been employed as the sole elementary general music teacher for sixteen years. Little transiency is experienced by students in this school district and the student population is quite stable: relatively few students move into or out of the district and most were enrolled in the district throughout their tenure as elementary and secondary students. As a result of the stability of this school population, most scores of students who were administered IMMA in Grades 4 and 5 could be matched to their Grade 3 scores. A large majority of students are White; approximately half live in rural poverty. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 243 Research Instrument The sole research instrument used for data collection was IMMA. It was presumed IMMA would measure developmental music aptitude of Grade 3 students and stabilized music aptitude of students in Grades 4 and 5. Data Collection IMMA was administered routinely to students in Grade 3 (N = 1,035) and intermittently to students in Grade 4 (N = 389) and Grade 5 (N = 226) during the researcher’s tenure as music teacher to provide data for individualized instruction. IMMA scores were collected and archived by the researcher for a period of 13 years (2007–2019). A longitudinal examination of Grade 3– Grade 5 Fall and Spring scores by grade level, between grade levels, and across the 3-year period was conducted to investigate when score fluctuation, attributed to musical environment, diminished in significance, thus indicating the approach of the stabilized music aptitude stage. Design In this quantitative study, the mean difference in students’ historical IMMA scores was examined using a variety of statistical tests. Individuals’ matched scores from Spring test administrations and subsequent Fall test administrations were used in paired t-tests to examine the effect of chronological age on music aptitude. Violations of the assumption of normal distribution often precluded use of paired t-tests; therefore, a series of Wilcoxon Signed Rank tests was conducted by academic year using tonal scores, rhythm scores, and composite scores for a grade-by-grade view of the effect of instruction on music aptitude. Scores from consecutive academic years were used in one-way repeated measures ANOVA to highlight longitudinal changes over a 3-year period that might suggest a period of transition between stages of music aptitude. Multivariate test results were used to confirm ANOVA results STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 244 Results Analyses Research Question 1. The results of a series of paired t-tests comparing IMMA Spring tonal scores, rhythm scores, and composite scores with subsequent corresponding Fall scores were mixed. Mean tonal scores increased, yet their difference was not statistically significant. In contrast, mean Grade 4 Spring/Grade 5 Fall rhythm and composite scores decreased; the mean differences also were not significant. A small but significant increase for Grade 3 Spring/Grade 4 Fall rhythm scores and composite scores was found. In all cases the mean score difference was less than one point. Gordon (1998) observed a tendency for music aptitude scores to increase with chronological age (p. 169); however, he noted scores gradually decreased as students approached the stabilized music aptitude stage (Gordon, 1981). The significant increase of Grade 3 Spring/Grade 4 Fall rhythm scores was suggestive of students’ continued presence in the developmental music aptitude stage, as it seemed the influence of instruction persisted. However, Gordon (2002) had predicted an average increase of approximately 2 points for developmental music aptitude scores of students receiving traditional instruction; the mean rhythm score increase in the current study did not meet that minimal threshold and thus seemed to lack practical significance. Although tonal scores generally continued to increase and Grade 4 Spring/Grade 5 Fall rhythm scores to decrease, their non-significant mean score differences seemed to suggest the static score fluctuation expected of students who previously had attained the stabilized music aptitude stage. Due to discrepancies in score direction and significance, the results of paired t-tests were inconclusive in determining the effect of chronological age on music aptitude. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 245 Research Question 2. Tonal and composite scores tended to increase from the Fall to Spring administrations of the same academic year; however, the mean differences were generally modest, as interpreted from Wilcoxon Signed Rank test results. Only tonal score differences for 2011–2012 and 2013– 2014 were statistically significant. Although rhythm scores tended to decrease from Fall to Spring administrations, the mean differences were small and not significant. Gordon (2002) had asserted a score increase of approximately 2 points when traditional instruction was offered. The mean score increases of the current study were well below that threshold; therefore, no effect of instruction was concluded. Nevertheless, it must be considered the type and quality of instruction might have had a detrimental effect on student music aptitude, as the formal instruction offered might have been ill-suited to the students’ current level of music aptitude, particularly for those who remained in preparatory audiation. Research Question 3. Repeated measures ANOVA was used to examine longitudinal score change of IMMA tonal scores, rhythm scores, and composite scores to consider a period of transition between the developmental and stabilized music aptitude stages. ANOVA results were confirmed by multivariate test results. An atypical pattern of tonal score fluctuation was observed in which significant score increases occurred from Grade 3 to Grade 4 and Grade 3 to Grade 5, as if students remained in the developmental music aptitude stage, but the mean tonal score difference from Grade 4 to Grade 5 was not significant, as if the stabilized music aptitude stage had already been achieved. Gordon (1981) had asserted score gains decreased as chronological age increased: the inconsistency of score fluctuation from Grade 4 to Grade 5 seemed to exemplify this assertion. A limiting effect of instruction on continued score gain of developmental music STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 246 aptitude must also be considered. Thus, a period of transition for tonal aptitude was asserted to account for atypical tonal score results, but could not be concluded definitively. No significant mean rhythm score differences were estimated for this sample; consequently, a period of transition was unsubstantiated for rhythm music aptitude. An atypical pattern similar to that of tonal scores also was observed for composite score difference: the score fluctuation typically associated with the developmental music aptitude stage and the cessation of score change characterized by the stabilized music aptitude stage (Gordon, 1971) were observed simultaneously. Discrepancies in score direction deviated from findings of previous research and warranted additional study. A period of transition for composite aptitude was speculated. Discussion Period of Transition A period of transition for rhythm aptitude was unsubstantiated from the findings of a series of repeated measures ANOVA. From the unusual pattern of rhythm score fluctuation and non-significant findings for most Groups, it is conceivable students had already progressed to the stabilized music aptitude stage. In contrast, a period of transition for tonal aptitude was suggested. The general and often significant increase in tonal scores found from results of a series of repeated measures ANOVA was suggestive of the continuation of the developmental tonal aptitude stage with a tapering of scores as students transitioned to the stabilized tonal aptitude stage. Significant composite score growth was interpreted for most Groups, which seemed to be indicative of a continuation of the developmental music aptitude stage. From the compilation of tonal, rhythm, and composite findings, one could interpret students had entered a period of transition to the stabilized music aptitude stage, as score increase seemed to gradually decline. However, if tonal aptitude and rhythm aptitude were examined individually, it was STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 247 conjectured the two constructs might have stabilized independently of one another. The Grade 3– Grade 4 tonal score increase was significantly different, as was that for Grade 3–Grade 5; thus, it seemed students remained in the developmental music aptitude stage. The mean difference in Grade 4–Grade 5 tonal scores and for rhythm scores was not significant, however, which was suggestive of a previous transition to the stabilized music aptitude stage. Although a period of transition could not be substantiated conclusively from these mixed results, it was speculated rhythm aptitude had stabilized independently of tonal aptitude. Additional research focused on the independent onset of stabilized tonal aptitude and rhythm aptitude is recommended. Gordon (2013) awarded more weight to difference between developmental and stabilized music aptitudes than to similarity (p. 15). He noted numerous differences in how developmental and stabilized music aptitudes were manifested in student musical behaviors: students in the stabilized music aptitude stage preferred to hear tonal and rhythm dimensions simultaneously and were able to attend to one or the other, showed consistent preference for phrasings, and reliably perceived even relatively small differences in dynamics, timbres, and tonal ranges (Gordon, 2013, pp. 15–16). It seems unlikely students would begin to exhibit stabilized music aptitude traits wholly and simultaneously; rather, it is probable students begin to exhibit traits of stabilized music aptitude by degrees. Gordon (1984b) developed the Instrument Timbre Preference Test (ITPT) for a 2-year investigation to determine if students demonstrated more success when they played an instrument for which they demonstrated a timbre preference and if success was more accurately predicted by MAP scores when students played an instrument whose timbre they preferred (Gordon, 1989c). Although Gordon (1984b) initially stated ITPT could be administered to students entering Grades 4, 5, or 6, he later clarified ITPT should be administered to students STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 248 prior to or in the grade in which beginning band instruction is offered. As many students begin instrumental music instruction in Grade 4, ITPT might be administered in Grade 3 Spring or Grade 4 Fall. Thus, Gordon seemed to believe students could discern differences in instrumental timbre as early as age 8, before music aptitude was purported to stabilize, and that difference in timbre might be perceived before difference in dynamics or other expressive elements. It was speculated the transition from the developmental to stabilized music aptitude stage was also of a gradual nature, in which changes to students’ perception occur differently for each student, as students similarly transition from preparatory audiation to audiation at different paces, according to their individual levels of developmental music aptitude. Nevertheless, it was conjectured from repeated measures ANOVA findings the transition to stabilized rhythm aptitude occurred before that of tonal aptitude in the current study. Gordon (2012) acknowledged it would be unusual for both developmental tonal and rhythm aptitude to be very high or very low for any individual student (p. 50). When interpreting music aptitude test scores, an attempt to raise the lower subtest score should be undertaken immediately (Gordon, 1998, p. 149). Thus, Gordon recognized tonal aptitude and rhythm aptitude both manifest and progress at different rates. It was evident from a comparison of the repeated measures ANOVA findings for tonal scores and corresponding rhythm scores of the current study that tonal scores seemed to exhibit significant growth more frequently and for different grade level groupings than did rhythm scores. The stages of developmental music aptitude and stabilized music aptitude each had been viewed by the researcher as a single gestalt construct: students either functioned musically in one stage or the other, as defined by their composite IMMA scores. However, the findings of the current study have prompted a consideration of developmental tonal aptitude and developmental rhythm aptitude from an STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 249 atomistic perspective instead: students might audiate tonally in one stage of music aptitude and rhythmically in another. Norton (1980) cautioned children must progress sequentially through a series of activities ordered by the level of abstract thinking required in order to develop musical understanding. Students in Norton’s study conserved tonal elements more successfully than rhythm patterns. The findings of previous studies in which researchers concluded music aptitude stabilized prior to age 9 (DeYarman, 1972; Harrington, 1969; Schleuter & DeYarman, 1977) seemed to be based on composite results: tonal aptitude and rhythm aptitude were not interpreted separately. Yet Gordon (1998) suggested the finding that children in the developmental music aptitude stage seemed to focus more on the musical instrument used to record music aptitude test items than on the content of the test items supported the assertion that developmental music aptitude was more closely related to atomistic than gestalt perspective (p. 71). If tonal aptitude and rhythm aptitude operate as independent constructs, it is possible students may transition between stages of music aptitude at different rates for each construct. Another indication Gordon regarded rhythm and tonal aptitudes as separate constructs was his emphasis on movement and its component parts—time, space, weight, and flow—which interacted to create rhythm (Gordon, 2012, p. 188). Although Gordon (2012) stated movement was foundational to rhythm (p. 190) and the best means through which students understand rhythm (p. 74), he posited the construct of space audiation, “a silent auditory response rather than a physical response” (Gordon, 2015) in his later writing. Gordon (2012) asserted the importance of an audiation breath, during which a pause is inserted between the teacher’s performance and the students’ performance of each tonal pattern to encourage audiation over imitation (p. 102), and alleged tonal audiation occurred during that breath (Gordon , 2013, p. 94). In addition, Gordon (1998) noted rhythm, particularly meter and tempo, was foundational to musical style STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 250 and expression (p. 60), and concluded tempo was the most fundamental of all rhythm aptitudes: tempo was basic to meter and meter to rhythm (p. 104). Thus, movement was perceived as foundational to rhythm, and rhythm in turn to tonal aptitude and musical style. Nonetheless, Gordon theorized the expressive dimension of stabilized music aptitude joined the tonal and rhythm dimensions, resulting in comprehensive music aptitude (Gordon, 1998, p. 60). Gordon (1998) described rhythm aptitude as foundational and basic. He maintained students with high IMMA rhythm and composite scores had overall music aptitude superior to those with high IMMA tonal and composite scores (Gordon, 1986c, p. 69), and noted knowledge of the occurrence of chord changes in syntactic time might be essential to the process of audiation (Gordon, 1998, p. 172). This notion of the primacy of rhythm was supported by empirical evidence, such as the findings of a factor analysis of MAP, PMMA, and IMMA, in which Gordon concluded a factorial relationship between the IMMA rhythm subtest and the MAP meter subtest and not to the PMMA rhythm subtest, as might be expected. Thus, Gordon (1986a) speculated the IMMA rhythm subtest might be more indicative of stabilized music aptitude than of developmental music aptitude. In contrast, Moore (1987) concluded music aptitude appeared too complex a concept to be wholly affected by rhythm aptitude, despite the contribution of rhythm aptitude to developmental music aptitude as a whole. Although IMMA tonal patterns and rhythm patterns were both comprised of the difficult patterns identified in Gordon’s taxonomic research (Gordon, 1986c, p. 22), to achieve the same percentile rank, students must receive a higher raw score on the IMMA tonal subtest than on the IMMA rhythm subtest. Thus, a difference in the level of difficulty of tonal patterns and rhythm patterns was evident from reported IMMA percentile ranks (Gordon, 1986c, p. 64), with the implication that difficult rhythm patterns were more complex than difficult tonal patterns. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 251 Gordon attributed higher rhythm aptitude than tonal to an environment more favorable to rhythmic development (Zimmerman, 1986) in his 1967a study of “educationally disadvantaged” students. Whether Gordon considered this environment to be a function of the school culture or the students’ extracurricular musical environment was unknown. However, the focus on rhythm activities to the detriment of tonal development was not uncommon in school music instruction. Talley (2005) noted the content areas and skills most frequently assessed by elementary general music teachers for students in kindergarten, first-, second-, and third-grades (singing voice development, rhythm, matching pitch, and beat competency) included tonal instruction only peripherally, and Young (1976) observed teachers considered rhythmic ability to be more essential to student performance than music reading. Moore (1987) concluded focus on rhythm aptitude might yield improvement. Thus, it was possible teachers’ concentration on rhythm instruction might have resulted in higher rhythm achievement than tonal achievement, which in turn might have accelerated growth of rhythm aptitude in the developmental music aptitude stage. In addition, it was possible students’ acculturation was stronger for rhythm than for tonal due to their musical experiences outside of school instructional time. Students might have had difficulty accessing their head register or matching pitch, which might have led to frustration, fear of failure, and reluctance to participate in tonal activities; thus, their level of tonal achievement lagged behind that of rhythm. Gordon (1986c) advocated for teachers to use their knowledge of student music aptitude scores to diagnose musical strengths and weaknesses for individualizing instruction (p. 76). Although Gordon recommended teachers immediately attempt to raise each student’s lower subtest score, it was possible tonal and rhythm dimensions were emphasized equally for students whose rhythm scores already exceeded their tonal scores, resulting in reinforcement and improvement of rhythm achievement to the detriment of tonal STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 252 achievement. Nevertheless, Gordon (1998) observed young children develop the second stage of audiation more quickly for tonal patterns than rhythm patterns, even in those who have attained the stabilized rhythm aptitude stage (p. 70). Gordon found it necessary to include audible clicks with the recorded rhythm patterns in PMMA and IMMA to provide the context of tempo for students in the developmental music aptitude stage and accents for students in the stabilized music aptitude stage. Thus, Gordon’s assertion of rhythm aptitude as foundational to music aptitude, a gestalt view, was reframed by the need to provide contextual support to establish tempo. Perhaps stabilization of characteristics of the rhythm dimension such as tempo also occur over time rather than concurrently. Conclusions regarding a period of transition differed by Group for tonal and rhythm scores in the current study. As such, it was difficult to substantiate or reject a general transition between stages of music aptitude for the entire sample of students, as the atomistic parts (tonal and rhythm) contributed to the gestalt whole (composite) in differing ways. Not only did tonal aptitude and rhythm aptitude appear to function as separate constructs, rhythm aptitude seemed to stabilize before tonal aptitude. Significant tonal score differences seemed to suggest score change consistent with continued presence in the developmental music aptitude stage, yet the lack of significant difference in rhythm scores seemed to suggest attainment of the stabilized music aptitude stage prior to Grade 3. Thus, it was speculated students had transitioned to the stabilized stage of rhythm aptitude before their transition to stabilized tonal aptitude, and this transition occurred prior to Grade 3. Gordon (1980b) had observed inconsistent growth of tonal and rhythm aptitude in minoritized students and posited a period of transition to account for that discrepancy. The uneven increase in tonal and rhythm scores in the current study confirmed STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 253 Gordon’s findings and established a foundation from which to conjecture attainment of stabilized tonal aptitude separate from that of stabilized rhythm aptitude. Effect of Instruction No effect of instruction for music aptitude was concluded in the current study. Although a significant increase in tonal scores was found, the mean difference was modest. A nonsignificant decrease in rhythm scores was found from Fall to Spring of the same academic year; the mean score difference also was small. In no case did the mean difference exceed the 2-point threshold asserted by Gordon (2002) for students participating in traditional instruction; thus, practical significance of mean score increase or decrease was questionable. DeYarman (1975) similarly questioned the practical significance of statistically significant results found in his investigation of MAP use with primary students, noting a minimal effect of different amounts and types of formal music instruction on music aptitude before Grade 4. The score fluctuation of the developmental music aptitude stage, tapering of score change as students’ age increased, and cessation of environmental influence on the stabilized music aptitude stage were described in previous research. Gordon (2013) noted the continual fluctuation of music aptitude before age 9, as it interacted with the environment (p. 13), yet environment had very little effect on music aptitude after age 9 (pp. 11–12). Between the developmental and stabilized music aptitude stages, however, Gordon (1986c) described a decline in score fluctuation due to decreasing influence of the musical environment as students’ chronological age increased (p. 103). In the current study, tonal scores increased significantly; however, mean differences were small. Thus, students’ tonal music aptitude seemed to continue in the developmental music aptitude stage, with score differences decreasing as students transitioned to the stabilized music aptitude stage. This observation seemed to align with the STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 254 findings of Phillips et al. (2002), who asserted aural skills developed before and during Grade 3, after which aural acuity no longer hampered pitch matching. In the current study, it seemed students’ rhythm aptitude had stabilized prior to Grade 3 (age 8 or 9), as rhythm score difference was not statistically significant. Gordon (2005) contended scores on developmental and stabilized music aptitude tests would increase with chronological age; however, students’ relative position in score distributions would remain constant when in the stabilized music aptitude stage. Gordon (2012) stated improvement of instruction was the primary objective of a music aptitude test (p. 51), through identification of students with high music aptitude to encourage participation in music activities and diagnosis of each student’s musical strengths and weaknesses to individualize instruction (Gordon, 1995, p. 9). Gordon (2001b) noted the biological limitation of low music aptitude can be lessened with differentiated instruction (p. 86), yet in order to individualize instruction appropriately, one must first ascertain students’ level of music aptitude. In combination with knowledge of a student’s chronological age, music aptitude test scores are suggestive of a student’s level of music aptitude (developmental or stabilized). Nevertheless, Gordon (1998) noted it seemed possible students in either stage of music aptitude could engage in preparatory audiation or audiation, regardless of chronological age (p. 72). Thus, inclusion of informal guidance is critical to establish the foundation necessary from which students may benefit most from formal instruction, regardless of the presumed stage of music aptitude of school-age students. Gordon (2012) asserted audiation of context and content of music was foundational to music meaning (p. 11), unlike other approaches to music learning, and sought to develop a music learning theory to describe how we learn music (p. 25). Through sequential stages of audiation, students enjoy music through understanding (p. 28). Gordon advocated for knowledge of stages STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 255 of music aptitude to design sequential instruction of audiation skills. Students in the preparatory audiation stage (music babble) need unstructured and structured informal guidance to allow students to progress through acculturation, imitation, and assimilation organically and to establish readiness for formal instruction. With appropriate guidance and instruction, children typically emerge from tonal and rhythm babble between ages 5 and 9 (Gordon, 2012, p. 251). Evidence of initial emergence from music babble are ability to distinguish between major and minor tonalities and usual duple and usual triple meters (Gordon, 2012, p. 251). Students have passed through tonal babble when they are able to sing in major and minor tonalities relatively in tune, using continuous flow of the breath (Gordon, 2012, p. 252) and through rhythm babble when they are able to chant alternately in usual duple meter and usual triple meter with a consistent tempo and chant a series of rhythm patterns in the same tempo without intervening beats (Gordon, 2012, p, 253). Without achievement of these audiational skills, the foundation on which to introduce formal instruction would be inadequate: Gordon (1987) emphasized students would learn less from formal instruction without the readiness provided from informal instruction (p. 9). Formal instruction, which includes use of tonal patterns and rhythm patterns within sequenced tonal and rhythm learning activities and in combination with classroom music activities, is most appropriate for students who have passed out of the preparatory audiation stage and therefore have the necessary foundation on which to continue to build audiation capacity. Thus, type and quality of instruction likely affected the results of the current study. Instruction by the researcher included creative movement activities based on Laban themes, vocal exploration, folk dance, singing games, tonal activities focused on presentation of materials in a variety of tonalities, and rhythm activities focused on presentation of materials in a variety of meters. Learning Sequence Activities using Gordon’s tonal (1990a) and rhythm STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 256 register books (1990b) were also included to individualize instruction according to music aptitude level, as determined through scores of bi-annual administrations of IMMA. However, little effort was made to address students remaining in preparatory audiation through incorporation of informal guidance within the context of the formal instruction offered. The cumulative effect of instruction misaligned with individual student musical age over a period of several years might have skewed the findings, particularly in a longitudinal examination, as students would not have had the appropriate level of readiness for each sequenced level of skills in turn. Taggart (1989) noted Gordon’s contention that establishment of musical context is necessary for accurate measurement of stabilized music aptitude. Non-compensatory and noncomplementary instruction could have resulted in the lack of musical context needed to accurately measure stabilized music aptitude in the current study. A detailed description of the type of instruction provided, especially in its function as compensatory or complementary to students’ musical needs, might have allowed a more precise interpretation of longitudinal IMMA score change to address the question of effect of instruction in a more focused and direct manner. Thus, despite the researcher’s best efforts to adapt instruction according to PMMA and IMMA test results and students’ implied level of music aptitude, the possibility some students remained in preparatory audiation for tonal or rhythm dimensions was not accounted for within the context of school music instruction. Gordon (2013) acknowledged the benefit of informal guidance and formal instruction, structured and unstructured, only when undertaken with knowledge of music aptitudes (p. 17), and posited the influence of early guidance and instruction would be greater on young children’s achievement than that of formal music instruction in later years (Gordon, 2012, p. 47). Thus, an effect of instruction on music aptitude was unsubstantiated within the context of the current study. Nonetheless, the lack of informal guidance opportunities STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 257 offered to address preparatory audiation deficits and establish a critical foundation for future audiation of students in the current study likely affected the future cultivation of developmental and stabilized music aptitude adversely. Effect of Chronological Age The findings of the current study did not suggest a significant increase in music aptitude test scores due to chronological age. The results of paired t-tests of Spring tonal, rhythm, and composite scores and corresponding Fall scores of the following grade level were mixed. Tonal scores increased nominally, but not significantly. Grade 3 Spring/Grade 4 Fall rhythm and composite scores increased significantly; however, the decrease in Grade 4 Spring/Grade 5 Fall rhythm and composite scores was not significant. Thus, no conclusive evidence of an effect of chronological age on music aptitude was interpreted. Gordon (1998) suggested a general score increase due to chronological age was typical (p. 169) and specified an approximate average increase of 2 points on developmental music aptitude tests for students participating in traditional instruction (2002), yet the results of the current study did not reach that threshold. Thus, the practical significance of score increase or decrease should be considered with caution. The finding of no effect of chronological age was not surprising, as similar results were reported in extant literature. Gordon (2013) noted the ability to generalize and infer was the basis of both music aptitude and general intelligence (p. 13); thus, student ability to synthesize information seemed more likely to affect music aptitude than did chronological age. Similarly, Gordon (1995) noted the role of neural maturation in MAP score increase with age, as scores of tests requiring continuous concentration likely increase in part because ability to concentrate is a feature of maturity (p. 86). Perhaps skills associated with neural maturation were more responsible for music aptitude than was chronological age itself. Although it might seem Gordon STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 258 (2013) was advocating for an association between music aptitude and academic intelligence, indeed he was forthright in his opposition to such a conclusion (p. 19). He did, however, acknowledge the characteristic skills inherent in measurement by a standardized test such as IMMA or MAP: students’ ability to generalize and infer musical content and context was analogous to their ability to audiate, and their effectiveness at maintaining concentration during test-taking an asset. Nevertheless, Gordon (1986c) distinguished chronological age (years of age) from musical age (developmental age specific to music), noting the latter was more important in determining when to begin instrumental instruction (Gordon, 2013, p. 149), formal instruction, or individualized instruction: musical age as measured by PMMA or IMMA, rather than chronological age, is the critical factor when adapting instruction to support individual learning differences (Gordon, 1986c, p. 75). In fact, Gordon (2013) asserted adequate readiness from appropriate preparatory audiation experiences, regardless of chronological age, was necessary for students to audiate well (p. 131), and cautioned quality and quantity of learning in music babble superseded the chronological age of students when they emerged from music babble (Gordon, 2012, p. 251). Moore (1987) observed lesson design that stimulates and challenges through research-based approaches could have a lifelong impact on students’ tonal aptitude and future music comprehension. It appears then that investment in informal guidance for students of all ages may be pivotal to a lifetime of audiation. Limitations of the Study Limitations of the study included the composition, homogeneity, and aggregate size of the sample, need for and possible effect of multiple imputation of missing values, internal validity concerns of testing, use of nonparametric statistical testing due to violations of STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 259 parametric statistical assumptions, use of raw scores as the data collection unit, and use of scores of students who had been administered IMMA for 3 consecutive years exclusively for repeated measures ANOVA. It is probable the lack of diversity of the convenience sample of students in the current study reflected less variation than that which might be found in the general population. In addition, the stability of the student population contributed to a more consistent music education than students in the general population might have experienced. Thus, it was possible this sample was not representative of the general population and caution must be exercised in generalizing the results of this study. Scores from students in Grades 3, 4, and 5 were included in this study. However, inconsistency of IMMA test administration by academic year and grade level resulted in Grade 4 and Grade 5 sample sizes that were considerably smaller than the Grade 3 sample. Use of this intact sample likely would have had a detrimental effect on statistical power. Student absences on the dates of test administration, for which make-up tests were not administered, resulted in missing data. Consequently, multiple imputation using predictive mean matching was used to complete the data set, which also increased the aggregate sample size of Grade 4 and Grade 5 scores. It was expected Fall and Spring scores of corresponding tests would be loosely related such that a difference in score values might reflect a meaningful change in student music aptitude. All missing tonal, rhythm, and composite values were imputed simultaneously in the current study. Thus, it was possible an imputed Fall or Spring score could have been generated in accordance with the parameters of predictive mean matching, yet did not comply with the premise of score association described above. It was possible this manner of imputation could have skewed composite scores, which, in turn, might have affected the mean composite score STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 260 difference. For example, an imputed Spring score that was dramatically different in value (e.g., 15 points) than its corresponding Fall score likely would be interpreted quite differently than Fall and Spring scores with a more modest score difference. An uncharacteristic increase or decline in scores might have obscured a relationship between Fall and Spring scores that would have been revealed had only observed values been included. In addition, artificially high or low scores might have affected test validity, as test scores might not have accurately represented the construct intended to be measured. Further consideration of the application of multiple imputation in the research design is recommended in replication studies. IMMA composite scores were intended to be the sum of IMMA tonal scores and rhythm scores; however, predictive mean matching, when conducted for all missing values simultaneously, did not accommodate this assumption. Thus, it was possible the combination of imputed and observed tonal, rhythm, and composite scores would not result in an accurate “tonal score plus rhythm score equals composite score” equation. Simultaneous multiple imputation using predictive mean matching was a limitation of the research design that might have affected the results of the current study. An adaptation of the research design to accommodate stratified imputation of tonal scores, rhythm scores, and composite scores to ensure appropriate relationships of all scores is suggested in future studies. No outliers were excluded from the sample. However, this limitation may have affected the dispersion of scores as well as the skewness and kurtosis of the distribution curve. Due to violations of the assumption of normality, the nonparametric Wilcoxon Signed Rank test was conducted in lieu of paired t-tests for comparison of matched pairs of scores of the same academic year. Limitations of the measurement instrument may have affected the results of the study: a defining attribute of stabilized music aptitude testing was the need for context in STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 261 measurement (Gordon, 2005; Taggart, 1989). However, it was possible the context provided in IMMA was inadequate to measure stabilized music aptitude. Limited evidence in extant literature of the use of IMMA as a test of stabilized music aptitude may have been a limitation on the instrument to measure the construct of interest, although Gordon (1986c) reported practice effects of PMMA and IMMA test taking were negligible and not a threat to validity (p. 109). An effect of testing resulting from administration of the same test each semester for 3 academic years could create a limitation of internal validity, as outcomes may have differed due to repeated testing with the same instrument. Continued study of the efficacy of IMMA as a measure of stabilized music aptitude is recommended. An additional limitation of this study was the use of raw or observed tonal scores, rhythm scores, and composite scores as the unit of data collection, rather than percentile ranks. It was determined raw scores were best suited for comparison across grade levels and different statistical tests, as percentile ranks were a function of students’ relative standing in comparison to their peers and would thus vary in accordance with grade level. However, apt comparisons of raw scores to IMMA standardization results could not be made, as Gordon (1986c) reported percentile ranks only. A final limitation of this study was the decision to use observed scores of only the students who had been administered IMMA for 3 consecutive years (in Grade 3, Grade 4, and Grade 5) to address Research Question 3, rather than all available IMMA scores. A close and indepth examination of score change was desired, in order to observe differences by grade level grouping. To supplement the interpretation of the detailed findings of the present study, a MANCOVA using all available IMMA scores is recommended to increase insight of longitudinal score change and allow interpretation of interactions between dependent variables. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 262 Implications No compelling evidence was found from the results of a series of paired t-tests to support an effect of chronological age on music aptitude. Mean score differences, although statistically significant, were small for Grade 3 Spring/Grade 4 Fall composite scores. The finding of no statistical significance for most scores supported an interpretation that students had progressed to the stabilized music aptitude stage. A gradual decrease in effect of environment due to age (Gordon, 1981), measured as IMMA scores, could have been interpreted from the findings of the current study; however, it appeared the decline in scores had begun as early as Grade 3. In contrast, continued score increases could have been interpreted as maintenance of the developmental music aptitude stage, in which scores would continue to fluctuate as chronological age increased. Nevertheless, most mean differences were not significant and were less than one-half point, well below the average yearly increase of 2 points expected with traditional instruction during the developmental music aptitude stage (Gordon, 2002). Developmental music aptitude was not depicted as typically described (Taggart, 1989), nor was stabilized music aptitude characterized as in extant literature (Gordon, 2004) from the results of these paired t-tests. It appeared music aptitude was not sensitive to an effect of chronological age and consequently IMMA was an appropriate measure for students in Grades 3, 4, and 5 at varying places along the music aptitude continuum: those who remained fully in the developmental music aptitude stage, were transitioning to the stabilized music aptitude stage, or had attained the stabilized music aptitude stage wholly, regardless of chronological age. Whether IMMA was as accurate a measure of stabilized music aptitude as of developmental music aptitude cannot be verified from the results of the current study. Nevertheless, the usefulness of IMMA as a measure of music aptitude for students in this period of transition between music STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 263 aptitude stages was not negated. IMMA may continue to be administered as a measure of developmental music aptitude and stabilized music aptitude, as no effect of chronological age was concluded for music aptitude in the current study. Additional research including students younger and older than the sample group of the current study is recommended to clarify further the effect of chronological age on music aptitude. No overall effect of instruction was concluded from the results of a series of Wilcoxon Signed Rank tests conducted by academic year for IMMA tonal, rhythm, and composite scores. It was anticipated a comparison of pre- and post-instruction music aptitude scores would help identify the grade level at which music aptitude appeared indifferent to the effect of instruction and consequently the age of onset of the stabilized music aptitude stage would be disclosed. Although tonal and composite score increases from Fall to Spring were observed, mean differences were small and statistically significant only for 2011–2012 and 2013–2014 tonal scores. Gordon (2005) asserted scores on a stabilized music aptitude test do not increase as a result of practice or training ; therefore, previous student attainment of the stabilized music aptitude stage was implied by the lack of significant score increase. Nevertheless, Reese and Shouldice (2019) warned of potential reduction of score gain if instruction ceased or effectiveness of teaching decreased (p. 478). Gordon (2001b) specified the need for informal guidance using age-appropriate techniques and materials to move students of all ages through music babble before formal instruction commenced. Without unstructured and structured guidance, students’ ability to develop audiation would be limited (p. 87). Thus, it seemed plausible instruction was inadequate for student needs and had a detrimental effect on expected score gains. The finding of no effect of instruction on music aptitude was specific to the parameters of instruction as applied in the current study. As an examination of the type and STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 264 quality of instruction was beyond the parameters of the current study, further investigation is required to understand more fully the effect of compensatory and complementary instruction on music aptitude. The subsequent action for a finding of no significance due to prior achievement of the stabilized music aptitude stage differs greatly from that due to inappropriate instruction. To the practitioner, attainment of the stabilized music aptitude stage prior to age 9/Grade 4 might not affect a music educator’s choice of music aptitude test, as IMMA seemed to measure developmental music aptitude and stabilized music aptitude equally well. On the other hand, Gordon (2013) contended a year of exposure to preparatory audiation might be needed to acquire readiness for formal instruction (p. 134); therefore, use of informal guidance in lieu of or as a supplement to formal instruction is recommended to support students’ musical age, regardless of their chronological age (p. 30). To the researcher, continued scrutiny of IMMA scores of students younger and older than the participants in the current sample might help clarify the grade level at which music aptitude stabilizes. However, further investigation of type and quality of instruction would necessitate a change in research design, likely to a quasi-experimental study in which pre- and post-treatment IMMA scores are examined after a specified period of welldefined instruction. The results of a longitudinal examination of tonal, rhythm, and composite scores over a 3-year period, conducted using a series of repeated measures ANOVA to investigate a period of transition between the developmental and stabilized music aptitude stages, were mixed. It appeared students might have remained in the developmental music aptitude stage through Grade 5, as significant mean increases from Grade 3 to Grade 4 composite scores as well as Grade 3 to Grade 5 composite scores seemed to indicate continued score fluctuation throughout the period STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 265 in question. However, findings of no significant difference were found for all rhythm scores except those of Groups B and E: rhythm scores had ceased to fluctuate meaningfully, which implied students might have progressed to the stabilized music aptitude stage prior to Grade 3. Tonal scores mimicked the composite trend of significant score increase from Grade 3 to Grade 4 and Grade 3 to Grade 5 for only 50% of the Groups. Group B results were an anomaly: Grade 4 rhythm scores were significantly lower than Grade 3 rhythm scores, but Grade 5 rhythm scores were significantly higher than Grade 4 rhythm scores for the same 3-year period. It was speculated a broad period of transition could account for the discrepancies found in score fluctuation, as well as this atypical trend not previously described in the research literature. For music educators, the implication of a transition period would be continued use of IMMA to measure music aptitude of students still in the developmental music aptitude stage, those transitioning between music aptitude stages, and those who had attained the stabilized music aptitude stage. Nevertheless, the longitudinal effect of instruction must be considered in regard to these findings. A lack of compensatory informal music guidance for students beyond preschool age might have hindered acquisition of higher-level music skills (Gordon, 2013, p. 9). A deficit of foundational audiation skills for students in the current study would not have been mitigated by formal instruction (Gordon, 2012, p. 263) in Grades 3, 4 and 5, and might have affected score changes in a manner that did not conform to that previously described for the developmental music aptitude stage or stabilized music aptitude stage. For the music educator, an implication of no effect due to inappropriate type of instruction was dire: an immediate shift in type of instruction to include informal guidance is urged, lest students’ development of audiation becomes inhibited. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 266 Recommendations Numerous recommendations for future research resulted from the current study. These are organized into two sections and outlined below to guide the reader. Figure 15 Adaptations and Extensions to the Current Study Adaptations to the Current Study Expanded and More Diverse Sample. A limitation of the current study was the use of a convenience sample of students from the school district in which the researcher was employed. This sample had limited diversity: a STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 267 large majority of students were White, lived in poverty, attended the same school district throughout their primary and secondary levels of education, and resided in a rural area. Therefore, replication of this study with a sample more representative of the socioeconomic and cultural diversity of the general population of American elementary school students is suggested in order to generalize findings to a larger and more heterogeneous population. Similarly, the stability of student enrollment within the current sample’s school system was atypical of the educational experience of students in many American schools. Students who transfer to other school systems might experience less consistency in type and frequency of instruction, and school systems with a more transient student population might struggle to provide individualized instruction simultaneously to students from a variety of musical backgrounds. Both of these instructional situations might influence the findings of a similar investigation of effect of instruction. Thus, a study including a sample more representative of the transiency rate of the general elementary school population is suggested in order to generalize findings to a more typical population. Numerous test scores were missing due to student absence during test administration and inconsistent IMMA administration to students in Grades 4 and 5. It is recommended IMMA test make-up sessions be offered to students who were absent in order to lessen the need for imputation of a large number of missing values. In addition, annual IMMA administration to students in Grades 3, 4, and 5 would yield a larger and more balanced sample of observed scores with which to conduct a longitudinal examination of music aptitude. Although multiple imputation is a reputable method of managing missing values, a research design that resulted in fewer missing values would yield a data set of observed values that most accurately represented the current music aptitude level of that specific sample of students. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 268 Consistency of Multiple Imputation Implementation. Multiple imputation using predictive mean matching with 10 imputations was conducted in the current study to increase statistical power due to the sizable quantity of missing data, as previously described. However, the premise that Fall and Spring scores would be loosely related was not accommodated by the multiple imputation procedure; therefore, it was possible imputed Spring scores could have appeared to increase or decline markedly from preceding Fall scores in an improbable manner not in keeping with previously reported score trends. It is recommended imputed data be examined for values inconsistent with accepted parameters of score relationships and those affected be replaced randomly with imputed values more closely aligned with those parameters. In addition, a slight adaptation of the research design to abstain from imputation of composite scores and impute only tonal scores and rhythm scores is recommended. Composite scores are defined as the sum of tonal scores and rhythm scores, yet imputed composite scores may not have adhered to this standard. Manual calculation of composite scores is recommended for instances in which a tonal score or rhythm score has been imputed, in order that the sum of an imputed score and observed score will equal the composite score. Mitigation of Outliers. No outliers were omitted from the observed or imputed data set. Nevertheless, the presence of outliers seemed to affect the score distribution, often resulting in a violation of the assumption of normality. Nonparametric methods, which have less statistical power than parametric methods (Russell, 2018, p. 23), were required to mitigate the effect of outliers. Instead, an examination of the data set before and after the multiple imputation procedure is recommended and management of outliers considered in advance of statistical testing. Field (2009) proposed three options for dealing with outliers: removal, if there is substantial reason to STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 269 believe the outlier is not representative of the population to be sampled; transformation of data, if it seems likely the statistical models perform better using transformed data than using data that violate the assumption the transformation corrects (Field, 2009, p. 155); and score changing, if the score is highly unrepresentative and biases the statistical model. It is possible exclusion of occasional outliers might have reduced the impact of outliers on the line of best fit and consequently affected the findings of the statistical tests (Russell, 2018, p. 240). Mitigation of Practice Effects. Gordon (1986c) reported no practice effects for repeated administrations of IMMA. Nevertheless, continued examination of IMMA as a measure of stabilized music aptitude with a larger and more diverse sample of participants with ages and grade levels similar to and different from participants in the current study is recommended to corroborate or refute Gordon’s findings and the findings of the current study. Extensions to the Current Study Parallel Examination of Percentile Ranks. Recommended extensions to the current study include studies in which the research design is expanded, such as a parallel examination of percentile ranks for comparison with raw scores and use of alternate statistical procedures. In addition, future research on the effect of instruction pertaining to type and quality of instruction, an expansion of grade levels within the sample, and an investigation of the effect of instruction using frequent music aptitude testing to continually adapt instruction are suggested. Attendance at professional development opportunities focused on preparatory audiation, an examination of the effect of music preference on music aptitude, mitigation of cultural bias in standardized testing, an investigation of effect of STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 270 ensemble participation on music aptitude, and an examination of independent stabilization of tonal and rhythm aptitudes are also proposed to clarify the effect of instruction in future studies. A limitation of the current study was the use of raw scores (number of correct answers) as the unit of comparison. In contrast, IMMA standardization results were reported as percentile ranks (Gordon, 1986c, pp. 64–65), standard units that situate students’ scores within the context of their grade level peers’ performance as a form of normative analysis. It was not possible to compare the results of the current study to those of the IMMA standardization group without use of a common unit of measure. Thus, an adaptation to the current research design in which raw scores are converted to local percentile ranks using frequency distributions of scores (Gordon, 2012, pp. 351–353) is proposed. In this way, findings based on comparisons made using raw scores would be enhanced by normative comparisons with local and IMMA standardization percentile ranks. Alternate Statistical Procedures. A limitation of the current study was the decision to use only observed scores from students who had been administered IMMA for 3 consecutive years (Grade 3, Grade 4, and Grade 5) in an attempt to garner a deep understanding of longitudinal score change of specific groups of students. Therefore, a recommendation for future research is to use all available scores for Grades 3, 4, and 5 for a more comprehensive investigation of a period of transition and to compare those results to the results of the more limited sample in the current study. In addition, a series of repeated measures ANOVA was used in the current study to probe longitudinal score change at a basic level, and multivariate results were used to confirm ANOVA findings. It is possible an examination of the dependent variables (IMMA tonal score, IMMA rhythm score, and IMMA composite score) and independent variables (Grade 3, Grade 4, and Grade 5) would STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 271 reveal relationships between all categories of independent and dependent variables most effectively. Therefore, a MANCOVA including three continuous dependent variables (Test Score: IMMA tonal score, IMMA rhythm score, and IMMA composite score), one categorical independent variable with 3 levels (Grade Level: Grade 3, Grade 4, and Grade 5), and one covariate variable with 2 levels (Type of Instruction: with informal guidance and without informal guidance), is recommended to determine whether there is a relationship between grade level and IMMA score after controlling for type of instruction (Hatcher, 2013, p. 374). An advantage of MANCOVA over ANOVA is its power to detect relationships between dependent variables, increase the power by reducing the size of the error term, and adjust mean scores on the independent variable for the covariate (Hatcher, 2013, pp. 375–376). Therefore, MANCOVA results would build on the foundational understanding gleaned from the detailed analysis of scores of grade level groupings from the current study in investigating a period of transition between music aptitude stages. Full Information Maximum Likelihood (FIML) is a method to estimate parameters in a variety of statistical models such as structural equation modeling (SEM) and is a popular modelbased procedure for handling missing data (McKnight et al., 2007, p. 163). Instead of imputing missing values, FIML uses the likelihood function to estimate the probability of the data as a function of the observed data and unknown parameters. Maximum Likelihood procedures produce unbiased estimates in large samples, approximate a normal distribution in repeated samplings (McKnight et al., 2007, pp. 160–164), and provide reproducible estimates with smaller standard errors than multiple imputation (MI) (Ghisletta & Aichele, 2017). With large amounts of missing information, MI can require 200–300 imputations of the data set to estimate a standard error similar to that of FIML; therefore, FIML is more efficient than MI (Von Hippel, STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 272 2016). Although MI is a more flexible procedure, bias can be introduced if there are conflicts between the assumptions of the imputation model and the analytic model. When used with SEM statistical software, FIML can be employed in a single step, in contrast with the multiple-step process of MI (data set imputation, statistical analysis, and pooling). Therefore, a future study using FIML to handle missing data is recommended. Structural equation modeling (SEM) is a flexible set of procedures used to estimate and test models that hypothesize causal relationships between unobserved (latent) and observed (manifest) variables (Hatcher, 2013, p. 478). The focus of SEM is on observation and indirect measurement of latent variables in order to theorize causal connections among them (Huck, 2012, pp. 504–505). The researcher’s knowledge of theory and previous research is used initially to define the latent variables, after which measurable variables are selected to illuminate the latent variables’ qualities (pp. 507–508). Therefore, SEM is not exploratory; instead, SEM compares actual relationships among variables to the theoretical relationships previously hypothesized and evaluates the fit of the model to explain observed data (Huck, 2012, pp. 504– 505). Diagrams which depict types, associations, and causal relationships of and between variables are used to present SEM results (Huck, 2012, p. 506). An advantage of SEM is that inferences may be drawn from large data sets (Leech et al., 2015, p. 90). Thus, the use of SEM in future studies is recommended, in order that relationships between the variables in the current study may be elucidated more clearly and completely. Investigation of Type and Quality of Instruction on Music Aptitude. The limitation of type and quality of instruction, although not a predetermined focus, was critical to the findings of the current study. No effect of instruction on music aptitude was found in the current study; however, a consideration of the type of instruction offered for all primary STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 273 grades as well as each intermediate grade level considered in the study might have clarified its function as traditional, compensatory, or complementary, as it was possible the necessary audiational skills required to establish readiness for the formal introduction offered in the school environment were insufficient for the participants in the current sample. This might have affected the results of the study adversely, as the effect of the type of instruction being examined might have differed from the type of instruction most appropriate for the participants. Gordon (1986c) emphasized the import of appropriate informal and formal music experiences (p. 103), and recommended longitudinal studies similar to those examining music achievement of culturally homogeneous students with differing levels of stabilized music aptitude should be undertaken for culturally homogeneous students with differing levels of developmental music aptitude, with special consideration of theories of instruction (Gordon, 1980b). In addition, Gordon noted any significant difference in developmental music aptitude between students from differing backgrounds would indicate the need for diverse (and culturally responsive) instruction to address those score differences. Therefore, type of instruction as it relates to compensatory and complementary instruction should be included as a variable in future studies of effect of instruction on music aptitude. Expansion of Sample Grade Levels. The current examination focused on IMMA scores of students in Grades 3, 4, and 5 in order to include the grade levels prior to, including, and following age 9, the target age at which the shift to stabilized music aptitude stage had been purported in previous research. As the results of the current study were inconclusive for effects of chronological age and instruction on music aptitude and a period of transition, the inclusion of Grade 2 and Grade 6 IMMA scores in future studies is recommended to further investigate the onset of stabilized music aptitude; Gordon STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 274 (2013) reported the difference between highest and lowest scoring Grade 2 students was greater than that of average scoring students in Grades 2 and 6 (p. 16). Gordon (2002) noted the use of the same test allowed easily explained and understood comparisons for students in different grades; thus, use of IMMA, which is standardized for all students in the proposed sample, is optimal and warranted. Results of such a study could be compared to a similar study conducted by Gordon (2002) in which PMMA and MAP non-preference subtests were administered to students in Grade 2 and Grade 6 and resulting correlations used to consider differences in developmental and stabilized music aptitudes, as further evidence of the dichotomy of music aptitude stages. Continual Adaptation of Instruction Based on Frequent Music Aptitude Testing. To increase understanding of how instruction can be adapted through close monitoring of developmental music aptitude, a study is recommended in which the scores of more frequent aptitude testing (perhaps every two months) are used to guide instruction. A sample size of at least N = 30 is small enough to provide the flexibility necessary for such frequent adaptation of instruction, yet large enough to satisfy the central limit theorem, which states the sampling distribution will be normally distributed regardless of the shape of the population distribution when samples are larger than 30 (Field, 2009, p. 782). Gordon (1998) proposed periodic administration of music aptitude testing, particularly for students still in the developmental music aptitude stage, in order to diagnose students’ musical strengths and weaknesses for individualized instruction and to identify students with high music aptitude to provide the opportunities necessary to maintain that level of aptitude (p. 119). Reese and Shouldice (2019) described the process of adapting instruction based on knowledge of music aptitude scores. Scores of multiple test administrations may be compared to assess effect of instruction. Stable STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 275 scores indicate instruction supports the tested level of aptitude; score increase suggests instruction has been compensatory. A decrease in scores signifies the need for an adjustment of instruction to provide additional support (p. 482). Moore (1987) concurred, noting primary music educators were able to monitor the effects of classroom instruction on developmental music aptitude to good effect. Thus, findings of extant literature establish a foundation for further study in which instruction is adapted consistently and frequently based on music aptitude test scores. Professional Development Focused on Preparatory Audiation. A pragmatic suggestion for all music educators interested in understanding the construct of preparatory audiation, building foundational preparatory audiation skills for use of formal instruction, and applying Music Learning Theory is to participate in the Gordon Institute for Music Learning (GIML) Professional Development Levels course in Early Childhood before or concurrently with their principal GIML course of interest (elementary general music, instrumental, or piano). Gordon (2012) noted students moved progressively through music babble when in the developmental or stabilized music aptitude stage and should not be hurried into audiation and formal instruction as they matriculate into school at age 5 or older until they have phased through preparatory audiation by sufficient participation in informal guidance activities (pp. 260–261). The need to address preparatory audiation deficiencies applies to educators of all instructional levels and areas of concentration. Gordon (1986c) noted informal instruction was organized according to tonal and rhythm concepts only (pp. 70–71); therefore, participation in professional development workshops or courses focused on preparatory audiation may provide needed clarification for inclusion of informal instruction beyond the early childhood years. Implementation of the concepts presented within those preparatory audiation workshops or courses, in an age-appropriate manner for school-age students, may help move students who STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 276 remain in preparatory audiation toward a level of audiation from which they may benefit wholly from formal instruction. Inclusion of Informal Guidance at All Levels of Instruction. As is evident from the findings of the present study, this researcher, despite reasonable and concerted efforts to adapt instruction to students’ musical needs according to their level of developmental or stabilized music aptitude, did not possess the necessary understanding of preparatory audiation to include sufficient informal guidance for students older than Grade 1. As previously recommended, practicing music educators should receive professional development focused on preparatory audiation, including how to identify each student’s phase of preparatory audiation for tonal and rhythm dimensions and implement informal guidance to support all students as they transition out of music babble, regardless of their chronological age or grade level. Ideally, preparatory audiation as a construct would be presented to preservice teachers in undergraduate methods courses, along with opportunities to observe and work with young children still in music babble. The National Association for Music Education (NAfME) published a position statement on early childhood music education (National Association for Music Education, 2021) and identified PreK–8 music standards (National Association for Music Education, n.d.), and organizations such as the Gordon Institute for Music Learning (GIML) and the Early Childhood Music & Movement Association (ECCMA) actively promote music education focused on early childhood, yet many music educators, including the researcher, were not made aware of best practice for implementation of those standards for school-age students remaining in preparatory audiation. Little information regarding early childhood music instruction or preparatory audiation was available in the researcher’s preservice training, and professional development opportunities were focused primarily on techniques and materials STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 277 applicable for formal instruction. Offering learning opportunities for groups of upper elementary students, some of whose members likely remained in preparatory audiation for tonal or rhythm dimensions, others who continued to function tonally or rhythmically in the developmental music aptitude stage, and still more who have transitioned to the stabilized music aptitude stage for tonal or rhythm dimensions is akin to simultaneously spinning multiple plates. If the topic of preparatory audiation is often absent from music teacher preparation, so the complexity of instructional design combining formal instruction with simultaneous mitigation of preparatory audiation deficits also has been overlooked. This lesson design skill set is specific and its implementation regrettably infrequent. In the current study, PMMA and IMMA were administered routinely by the researcher in Fall and Spring to all students in kindergarten through Grade 3. Pre- and post-instruction scores were scrutinized and instruction adapted accordingly by the researcher; instruction was individualized based on bi-annual PMMA and IMMA scores and student performance. However, adaptations to instruction were limited to content and skills within the context of formal instruction; little attempt was made to modify instruction to accommodate students who had not yet passed out of preparatory audiation. In contrast, Gordon (1986c) recommended students in Grades K–3 participate in formal and informal instruction, as concurrent experience in both strengthens the outcomes of formal instruction (p. 70). Thus, it is imperative music educators of students at all levels and in all areas of music study recognize the need to include aspects of informal guidance, as there are likely students in their classes or ensembles who remain in preparatory audiation tonally or rhythmically and will not benefit from formal instruction until they have passed out of music babble (students whose musical age is delayed in comparison to their chronological age). Informal guidance activities such as singing tonal patterns, chanting STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 278 rhythm patterns, and exposure to songs and chants in a variety of tonalities and meters, in conjunction with formal instruction, are suggested to strengthen audiation and thus increase music aptitude scores (Gordon, 2005). Effect of Music Preference Testing on Music Aptitude. An investigation of the effect of music preference on determination of stabilized music aptitude is suggested as an extension of the current study. Boyle (1992) noted general acceptance of music preference as a construct (p. 251), yet the means of measurement of this construct were not universally accepted. Gordon (1986a) reported music preference measures were within the purview of stabilized music aptitude and noted young children in the developmental music aptitude stage were unable to make reliable judgements about music preference, regardless of the content or framing of test items (Gordon, 1998, p. 70). The dependence of future musical success on the MAP musical sensitivity total test score (Gordon, 1995, p. 55), and the rhythm imagery total test score in particular, were noted (Gordon, 1998, p. 141). However, IMMA, purported to function as a measure of stabilized music aptitude for students age 9 and higher (Gordon, 1989d), contains no preference subtests due to its primary function as a measure of developmental music aptitude more discriminating than PMMA (Walters, 1991). Gordon (2005) described the following indirect findings associated with preference tests: 1. Successful music students score high on preference measures. 2. Students who score high on preference measures demonstrate higher levels of expression and overall sensitivity in their performances. 3. Preference scores are highly correlated with potential to create and improvise music. 4. Preference scores are highly intercorrelated with the MAP meter subtest. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 279 5. Preference scores are highly correlated with ability to recall and make musical inferences, whereas non-preference subtest scores are more highly correlated with ability to memorize and imitate music (p. 16). Much may be gained from analysis of students’ scores on preference measures, yet only MAP provides the opportunity to glean this information. Therefore, an examination of concurrent IMMA and MAP scores of intermediate students would offer the ability to correlate IMMA scores with those of a valid test of stabilized music aptitude, with and without preference subtest scores, to confirm independently the assertion that IMMA functions as a test of stabilized music aptitude for students age 9 and older. The longitudinal predictive validity of the MAP battery was due in part to its three preference subtests (Gordon, 1998, p. 61); MAP’s ability to diagnose musical strengths and weaknesses in order to individualize instruction was notable (Gordon, 2001c) and superior to that of IMMA (Geissel, 1985, p. 32). Therefore, the potential for MAP preference subtest scores to boost the diagnostic capabilities of IMMA should be examined, as administration time of the two IMMA subtests is markedly less than that of the full MAP battery. This investigation could lead in turn to an examination of MAP preference subtest use to augment the ability of the Advanced Measures of Audiation (AMMA), a measure of stabilized music aptitude standardized for students in junior high through university, to describe stabilized music aptitude more fully, as preference tests, despite a likely increase in test validity, were excluded from AMMA to minimize test length (Gordon, 1998, p. 111). Mitigation of Cultural Bias in Standardized Testing. Even as of this writing, avoidance of talk about race was typical, as “White people have been socialized not to talk about race” (Bradley, 2007, p. 152). Thus, the discourse necessary to STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 280 address the power differential resulting from racism was frequently absent in American society and schools: Bradley, citing Pollock (2004), noted those involved in education in all roles “lacked the language” to discuss race. Nevertheless, to avoid talking about race was to risk perpetuation of racism by not disputing whiteness as the cultural norm and to marginalize students who experience racism. Therefore, Gillborn (2006) advocated using knowledge gleaned from past errors to adapt to the challenges of the present: an understanding of Gordon’s (1980b) intent in conducting a study of “inner city” students must be tempered by the anti-racism and social justice aims of Critical Race Theory (CRT), a theoretical and analytical framework for educational research (DeCuir & Dixson, 2004), that reject the placement of White culture as advantageous, despite the pervasiveness of this mindset in the social fabric of the American culture (Gillborn, 2006). Because no description of the racial, ethnic, or socioeconomic makeup of the IMMA norms sample was given, it was inferred the sample was relatively homogeneous in composition. Gordon (1998) stated definitively that MAP scores, including preference test scores, were normally distributed (p. 100) and all students audiated similarly, regardless of cultural background (Gordon, 1981). Gordon’s attempt to mitigate the effect of cultural difference was apparent in his use of object identifiers to eliminate the need for reading, writing, and English language skills in PMMA and IMMA (Gordon, 1986c, p. 33). He also endeavored to eliminate cultural bias in the design of MAP (Gordon, 1967a) and as a factor in student achievement (Gordon, 1980b) through selection of samples of diverse students. Gordon’s (1987) use of a variety of modes and meters was an attempt to ensure MAP would generalize to “occidental culture” (p. 69); results of research studies have confirmed the validity of MAP for use in East Asian cultures as well (South Korea: Reynolds & Hyun, 1994; Taiwan: Chuang, 1997). Yet STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 281 Gordon interpreted potential differences in music aptitude by race were based on limited environmental music opportunities and speculated students’ motivation to learn might have affected their performance on PMMA. Today, we must question this deficit perspective and instead suggest that the measure may not be sensitive to the environmental opportunities available or music intellect possessed by minoritized students in the United States as well as nonWestern cultures in other countries. Typically, norms of standardized tests are based on the scores of dominant groups, which can result in bias against minoritized students (Kim & Zabelina, 2015). This seemed to be the case for the small norms sample used for IMMA, which was selected from a limited geographic region (Gordon, 1986c, p. 85). Yet historically, standardized tests have been found to reproduce racial and economic inequalities that correlate with societal inequities (Au, 2008). Variance among test scores can be explained by noninstructional factors such as poverty rate, language barriers, and racism (Kohn, 2000). Knoester and Au (2017) noted correlations of structural inequalities associated with racism and poverty with K-12 standardized testing were stronger than with any other factor. Further examination of IMMA test validity with minoritized students is advised, as continued reliance on findings from limited studies including homogeneous samples may affect generalization of those findings to minoritized student populations. In addition, creation of local norms is recommended for a more accurate comparison of findings of diverse populations. Gordon (1986c) suggested the development of local norms was an outgrowth of frequent test administrations and might be superior for comparing relative standing (p. 86). Holahan and Thomson (1981) concurred, and proposed construction of local norms as standard practice for all tests. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 282 Potential bias inherent in standardized testing must be considered. Standardized test content frequently requires foundational knowledge and skills disproportionately possessed by students from more economically privileged backgrounds (Kohn, 2000). Interpretation of test scores and the resulting instructional decisions can be inequitable as well (Hood, 1998): low scores may highlight a mismatch between the test creator’s frame of reference and the student’s cultural frame of reference (Bond, 2017; Koelsch et al., 1995), rather than indicate low aptitude. It behooves us to examine whether the inequities of standardized achievement tests also apply to standardized music aptitude tests. If music aptitude scores are to represent students’ potential to learn in music, it is critical we acknowledge the extent to which factors such as racism, poverty, and foundational knowledge may influence scores. Future research is recommended in which the impact of these extra-musical factors on music aptitude are investigated. Gordon (1986c) recognized extra-musical factors also must be considered in designing appropriate instruction and the expertise of the music teacher taken into account (p. 76). He suggested differences in aptitude scores of students of diverse backgrounds indicated a need for changes in instruction to mitigate those differences (Gordon, 1980b). Culturally- and musicallyresponsive instruction is needed to mitigate cultural bias in testing (Kim & Zabelina, 2015), and such instruction is assessed more effectively through use culturally responsive measures (Hood, 1998). Inclusion of creativity as an additional criterion might be considered to supplement the information provided from a standardized measure of music aptitude; performance-based assessment has been found to measure higher order thinking skills and provide a fairer assessment of minoritized students (Hood, 1998). One such assessment tool is the Torrance Test of Creative Thinking (TTCT), which measures creative strengths such as fluency, originality, and elaboration. The addition of music preference tests to IMMA tests also might provide a more STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 283 complete snapshot of students’ music potential, as the TTCT tasks were not applicable directly to standardized tests of music aptitude. Nonetheless, a study in which musical tasks similar to those included in TTCT are examined for reduction or elimination of cultural bias might be useful. In addition, future studies including more diverse samples are needed in order that generalizations to more diverse populations can be drawn and appropriate, culturally-responsive changes to instruction made. The primary goals of music aptitude test administration are to improve instruction and identify students with high music aptitude in order to encourage participation in school music instruction: both are intended to helping students fulfill their musical potential. This is an ideal that recognizes and embraces innate music aptitude as an asset and a fund of knowledge (Moll, 1992) worth cultivating. Effect of Ensemble Participation on Music Aptitude. Elementary students in Grades 4 and 5 often have the opportunity to participate in school performance ensembles such as band, chorus, or orchestra. Approximately 88.5% of the students in the current study participated in performance ensembles as elementary students. However, few research studies were found in which the effect of ensemble participation on music aptitude was examined. As part of his 1967 standardization of the MAP battery, Gordon (1998) investigated the relationship of stabilized music aptitude scores to instrumental music instruction (p. 79). Instrumental ensemble members scored only slightly higher than choral ensemble members, who in turn scored higher than nonparticipants (Gordon, 1998, pp. 79–80), although similar score distributions were found for students who participated in band, orchestra, and chorus at each grade level, as well as for instrument type (Gordon, 1995, p. 91). No findings of further statistical testing were reported; however, separate MAP norms for musically select ensemble STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 284 participants at the elementary, junior high, and senior high school levels were published (Gordon, 1995, p. 92). Regrettably, the discrepancies in MAP scores between participants and non-participants in school music ensembles might be attributable to the selectivity of music performance groups. Gordon (1995) found approximately half the students who scored in the upper 20% on MAP did not participate in school ensembles or receive special instruction (p. 9); consequently, MAP use was promoted in order to identify high-aptitude students to encourage participation in music activities (Gordon, 1995, p. 9). Nevertheless, Gordon (1998) noted non-participation in ensembles did not limit students from scoring high on music aptitude measures, nor did ensemble participation guarantee high music aptitude scores (p. 80). In fact, Gordon found MAP scores of students with instrumental training were only negligibly higher than scores of those who did not participate in the school instrumental program and concluded no effect of training on MAP scores (p. 106). Although the effect of instrumental instruction did not seem to have an effect on stabilized music aptitude as measured by MAP, students’ instrumental achievement was greater when teachers used their knowledge of student MAP scores to adapt instruction (Gordon, 1998, p. 106). No extant research on the effect of ensemble participation on developmental music aptitude was found. Therefore, initial investigation on the effect of ensemble participation on developmental music aptitude and further investigation to confirm or refute Gordon’s findings on its effect on stabilized music aptitude are recommended. Atomistic Examination of Tonal Aptitude and Rhythm Aptitude Stabilization. Independent transition to stabilization of tonal and rhythm aptitude was conjectured from the findings of the current study. Only rhythm findings of paired t-tests using Grade 3 Spring/Grade 4 Fall scores were significantly different: rhythm aptitude seemed to remain in the STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 285 developmental music aptitude stage. Because no significant tonal score changes were found, there was no evidence tonal aptitude continued to fluctuate. By definition, music aptitude that was no longer influenced by the musical environment was considered stabilized. Therefore, it was concluded the transition to stabilized music aptitude seemed complete prior to Grade 3 for the tonal dimension but was just beginning in Grade 4 for the rhythm dimension. In contrast, significant mean tonal score differences were noted for Groups A, C, and D in an examination of repeated measures ANOVA results; continued score fluctuation, a feature of developmental music aptitude, was suggested from these findings. However, significant mean rhythm score differences were noted only for Groups B and E: score change remained relatively static for the majority of Groups, which was suggestive of previous attainment of the stabilized rhythm aptitude stage prior to Grade 3. Gordon (2013) noted student engagement in tonal and rhythm preparatory audiation may differ by type and stage (p. 30). However, discussion of tonal aptitude and rhythm aptitude as independent constructs in the transition to the stabilized music aptitude stage was not found in the extant literature reviewed for the current study. Gordon (1981) observed the interaction between music aptitude and the musical environment likely occurred from birth to age 8, although the effect of environment decreased as chronological age increased, until approximately age 9, when music aptitude stabilized. Despite frequent descriptions of music aptitude stabilizing at age 9 in extant literature, it was inferred this conclusion was drawn from the deficit of significant fluctuation of composite scores, as separate consideration of tonal score or rhythm score change was not stipulated. Thus, a gestalt perspective of music aptitude stabilization was concluded from composite results, while implications drawn from atomistic results seem to indicate the possibility of independent stabilization of tonal and rhythm dimensions. Although STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 286 equating of the paired t-test and repeated measures ANOVA findings is not recommended, their difference is notable: tonal aptitude seemed to stabilize before rhythm aptitude when scores from adjacent test administrations were examined, yet rhythm aptitude was asserted to stabilize first when longitudinal scores were considered. Perhaps the former occurrence is only a preliminary, short-term effect and the long-term effect of initial stabilization of rhythm aptitude is conclusive. Regardless, it appears tonal and rhythm aptitude stabilize independently and the temporal aspect of test administration (based on scores from consecutive administrations or those collected over a prolonged period of time) should be considered when interpreting results of future studies. Specifically, an investigation expressly focused on the concurrent or independent stabilization of tonal aptitude and rhythm aptitude is recommended to confirm or refute the findings of the current study. Conclusions The objective of the current study was to investigate the onset of, transition to, and longitudinal constancy of stabilized music aptitude in upper elementary students. It was predicted no effect of chronological age (Gordon, 1989b, 2005) or instruction (DeYarman, 1975; Fosha, 1964; Gordon, 1981; Mang, 2013) would be found, based on findings of previous research. In contrast, evidence of a period of transition between the developmental and stabilized music aptitude stages at approximately age 9/Grade 4 was anticipated, based on observations by Gordon (1989b, 2006). As expected, no effect of chronological age on music aptitude was concluded. Significant results were found only for Grade 3 Spring/Grade 4 Fall rhythm scores and composite scores. Nevertheless, this close examination of tonal, rhythm, and composite scores confirmed several of Gordon’s assertions: scores tended to increase with chronological age (Gordon, 1998, p. 169), STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 287 score gains began to decrease as students transition to the stabilized music aptitude stage (Gordon, 1981), and an average annual increase of two points for developmental music aptitude scores for students receiving traditional instruction (Gordon, 2002) served as a useful threshold for determination of practical significance of score change. Replication of this study with a more culturally and socio-economically diverse sample is recommended in order to better generalize the results of the current study to a broader population. In addition, a comparable study including students younger and older than those in the current sample is suggested to further define the onset of stabilized music aptitude. No effect of instruction on music aptitude was found for the current study. A small but significant increase in tonal and composite scores was found; the direction of rhythm score change was inconsistent, and mean rhythm score differences were not significant. Nonetheless, the lack of informal guidance activities necessary to address deficits in students’ preparatory audiation might have affected the longitudinal influence of formal instruction for this group of students, as reflected in the IMMA scores under consideration. Thus, the corroboration of Gordon’s (2013) assertion of the necessity of inclusion of informal guidance to enhance readiness for formal instruction for students regardless of chronological age was an important finding of this study. Future research focused on the effect of type and quality of instruction on music aptitude is recommended. A period of transition could not be substantiated conclusively from the results of the current study. An atypical pattern of tonal score change seemed to support a period of transition; however, non-significant mean rhythm score differences seemed to indicate students had already attained the stabilized music aptitude stage. Thus, it was conjectured tonal aptitude and rhythm aptitude stabilized independently. For the current study’s subjects, tonal aptitude seemed to have STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 288 stabilized prior to Grade 3, according to paired t-test results of IMMA scores of consecutive test administrations of adjacent grade levels. However, it appeared rhythm aptitude stabilized before tonal aptitude, as concluded from an examination of longitudinal data. Investigation of independent stabilization of tonal aptitude and rhythm aptitude is recommended for future study. In addition, an effect of instruction due to an insufficient level of preparatory audiation readiness could have affected the findings for a period of transition; therefore, further research is necessary to continue exploration on this topic. An important premise of the current study was that recognition of the onset of stabilized music aptitude would be more practical and unequivocal than identification of the culmination of developmental music aptitude. In practice, this proved less straightforward than predicted, as evaluation of the direction and size of score change was necessary to determine an interpretation of students’ current music aptitude stage. It was anticipated a significant score increase would be indicative of continuation in the developmental music aptitude stage. A significant score decrease was interpreted as the active decline in influence of the musical environment (Gordon, 1986a) experienced by students in a period of transition from the developmental to the stabilized music aptitude stage. A lack of significant score fluctuation, whether as an increase or decrease in scores, signified static score change: students likely had already attained the stabilized music aptitude stage. An a priori interpretation of scores from which the onset of stabilized music aptitude would be determined was not defined in the research design of the current study. Although not specified in extant literature, it was implied composite scores were used to determine the approximate age at which the stabilized music aptitude stage was reached. This was reflective of a gestalt perspective of onset of stabilized music aptitude. In contrast, an atomistic viewpoint was STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 289 represented in the conjecture that tonal aptitude and rhythm aptitude stabilized independently of one another. Therefore, a period of transition could not be concluded definitively without a clear understanding of the influence of the atomistic position in defining stabilized music aptitude. Research is deemed significant if it contributes uniquely to the theory or knowledge base of its field of study. Because the published research was sparse, an investigation of the onset of and transition to stabilized music aptitude of upper elementary students was warranted. Although a period of transition was not substantiated conclusively from the results of the current study, it was conjectured tonal aptitude and rhythm aptitude stabilized independently of one another. These findings yielded a new conceptualization of tonal aptitude and rhythm aptitude as separate constructs whose independent stabilization will need to be confirmed or refuted through continued research. As students’ transition from preparatory audiation may differ for tonal and rhythm dimensions, so tonal and rhythm aptitude also may stabilize at different rates. Music educators had been encouraged to provide, through inclusion of informal guidance with formal instruction, compensatory instruction to raise the tonal or rhythm dimension identified through music aptitude testing as each student’s weakness, as well as complementary instruction to maintain or increase the tonal or rhythm dimension identified as each student’s strength (Gordon, 1986c). Continued compensatory and complementary instruction is encouraged throughout the intermediate grade levels, as the simultaneous shift from developmental to stabilized music aptitude for tonal and rhythm dimensions cannot be presumed if the independent stabilization of tonal and rhythm aptitudes is accepted. Because it is conjectured tonal and rhythm aptitudes stabilize at different times and different rates for individual students, it is critical informal guidance activities are offered to students of all ages and levels of experience. Singing tonal patterns, chanting rhythm patterns, and participating in musical interactions with songs and STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 290 chants in a variety of tonalities and meters are additional tools to individualize instruction for students who have not emerged from music babble. Students may remain in or be transitioning from the developmental music aptitude stage for one dimension (tonal or rhythm) and thus need continued compensatory or complementary instruction, yet have already transitioned to the stabilized music aptitude stage for the other dimension, for which instruction no longer has influence on music aptitude. Not only is the premise that all students attain the stabilized music aptitude level at approximately age 9/Grade 4 in question, but a presumption that all students attain the stabilized music aptitude level for tonal and rhythm dimensions simultaneously is also at issue. Thus, future research is recommended to replicate this study with a more diverse sample, adapt the multiple implication method, conduct a parallel examination using percentile ranks, examine the data using different statistical testing, and expand the study to include students at younger and older grade levels. Suggested extensions to the current study include investigations of type and quality of instruction on music aptitude, including one in which instruction is continually adapted based on frequent music aptitude testing. Recommendations for practical application of music educators include professional development focused on addressing preparatory audiation needs of all students, as well as inclusion of informal guidance at all grade levels. Mitigation of cultural bias in standardized testing must be considered as more minoritized students become included in heterogeneous study samples. Examinations of the effect of music preference testing and ensemble participation on music aptitude would supplement the current knowledge base as well. Finally, an investigation of the stabilization of tonal and rhythm aptitudes as independent constructs is advocated to further understand and extend the findings of the current study. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 291 References Acock, A. C. (2005). Working with missing values. Journal of Marriage and Family, 67, 1012– 1028. https://doi.org/10.1111/j.1741-3737.2005.00191.x Allen, B. (1981). Student dropout in orchestra programs in three school systems in the state of Arkansas (Publication No. 8201181) [Doctoral dissertation, Northeast Louisiana University]. ProQuest Dissertations and Theses Global. Allison, P. (2009). Missing data. In R. E. Millsap & A. Maydeu-Olivares (Eds.), The Sage handbook of quantitative methods in psychology (pp. 72–89). Sage Publications Ltd. http://dx.doi.org/10.4135/9780857020994.n4 Allison, P. (2015, March 5). Imputation by predictive mean matching: Promise & peril. Statistical Horizons. https://statisticalhorizons.com/predictive-mean-matching Amchin, R. (1995). Creative musical response: The effects of student–teacher interaction on the improvisation abilities of fourth- and fifth-grade students (Publication No. 9542792) [Doctoral dissertation, University of Michigan]. ProQuest Dissertations and Theses Global. Arms Gilbert, L. (1997). The effects of computer-assisted keyboard instruction on meter discrimination and rhythm discrimination of general music education students in the elementary school (Publication No. 9806336) [Doctoral dissertation, Tennessee State University]. ProQuest Dissertations and Theses Global. Atterbury, B. W., & Silcox, L. (1993). A comparison of home musical environment and musical aptitude in kindergarten students. Update: Application of Research in Music Education, 11(2), 18–22. https://doi.org/10.1177/875512339301100205 STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 292 Au, W. (2008). Devising inequality: A Bernsteinian analysis of high‐stakes testing and social reproduction in education. British Journal of Sociology of Education, 29(6), 639–651. https://doi.org/10.1080/01425690802423312 Auh, M. (1995). Prediction of musical creativity in composition among selected variables for upper elementary students (Publication No. 9604632) [Doctoral dissertation, Case Western Reserve University]. ProQuest Dissertations and Theses Global. Azzara, C. (1992). The effect of audiation-based improvisation techniques on the music achievement of elementary instrumental students (Publication No. 9223853) [Doctoral dissertation, University of Rochester, Eastman School of Music]. ProQuest Dissertations and Theses Global. Baer, D. E. (1987). Motor skill proficiency: Its relationship to instrumental music performance achievement and music aptitude (Publication No. 8720238) [Doctoral dissertation, University of Michigan]. ProQuest Dissertations and Theses Global. Bailey, J. (1975). The relationships between the Colwell music achievement tests I and II, the SRA achievement series, intelligence quotient, and success in instrumental music in the sixth grade of the public schools of Prince William county, Virginia (Publication No. 7606685) [Doctoral dissertation, University of Illinois at Urbana–Champaign]. ProQuest Dissertations and Theses Global. Bash, L. (1983). The effectiveness of three instructional methods on the acquisition of jazz improvisation skills (Publication No. 8325043) [Doctoral dissertation, The State University of New York at Buffalo]. ProQuest Dissertations and Theses Global. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 293 Belczyk, M. E. (1992). Using music aptitude and timbre preference test results to predict performance achievement among beginning band students (Publication No. 9227434) [Doctoral dissertation, Temple University]. ProQuest Dissertations and Theses Global. Bell, W. A. (1981). An investigation of the validity of the “primary measures of music audiation” for use with learning disabled children (Publication No. 8124579) [Doctoral dissertation, Temple University]. ProQuest Dissertations and Theses Global. Bentley, A. (1966). Measures of musical abilities. Harrap Audio–Visual. Bergonzi, L. S. (1991). The effects of finger placement markers and harmonic context on the development of intonation performance skills and other aspects of the musical achievement of sixth-grade beginning string students (Publication No. 9208492) [Doctoral dissertation, University of Michigan]. ProQuest Dissertations and Theses Global. Bernhard, H. C. (2003). The effects of tonal training on the melodic ear playing and sight reading achievement of beginning wind instrumentalists (Publication No. 3093857) [Doctoral dissertation, University of North Carolina at Greensboro]. ProQuest Dissertations and Theses Global. Bixler, J. (1968). Musical aptitude in the educable mentally retarded child. Journal of Music Therapy, 5(2), 41–43. https://doi.org/10.1093/jmt/5.2.41 Bluestine, E. M. (2007). A comparative study of four approaches to teaching tonal music reading to a select group of students in third, fourth, and fifth grade (Publication No. 3268133) [Doctoral dissertation, Temple University]. ProQuest Dissertations and Theses Global. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 294 Bolton, B. M. (1995). An investigation of same and different as manifested in the developmental music aptitudes of students in first, second, and third grades (Publication No. 9535717) [Doctoral dissertation, Temple University]. ProQuest Dissertations and Theses Global. Bond, V. (2017). Culturally responsive education in music education: A literature review. Contributions to Music Education, 42, 153–180. https://www.jstor.org/stable/26367441 Boyle, J. D. (1982). A study of the comparative validity of three published, standardized measures of music preference. Psychology of Music, 10(1), 11–16. https://doiorg.gate.lib.buffalo.edu/10.1177/0305735682101002 Boyle, J. D. (1992). Evaluation of music ability. In R. Colwell (Ed.), Handbook of research on music teaching and learning: A project of the music educators national conference (pp. 246–265). Schirmer Books. Boyle, J. D., & Radocy, R. E. (1987). Measurement and evaluation of musical experiences. Schirmer Books. Bradley, D. (2007). The sounds of silence: Talking race in music education. Action, Criticism, and Theory for Music Education, 6(4), 132–162. http://act.maydaygroup.org/articles/Bradley6_4.pdf Briscuso, J. J. (1972). A study of ability in spontaneous and prepared jazz improvisation among students who possess different levels of musical aptitude (Publication No. 7226656) [Doctoral dissertation, University of Iowa]. ProQuest Dissertations and Theses Global. Brokaw, J. P. (1983). The extent to which parental supervision and other selected factors are related to achievement of musical and technical–physical characteristics by beginning instrumental music students (Publication No. 8304452) [Doctoral dissertation, University of Michigan]. ProQuest Dissertations and Theses Global. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 295 Brown, M. (1969). The optimum length of the musical aptitude profile subtests. Journal of Research in Music Education, 17(2), 240–247. https://doi.org/10.2307/3344329 Bugos, J., Heller, J., & Batcheller, D. (2014). Musical nuance task shows reliable differences between musicians and nonmusicians. Psychomusicology: Music, Mind, and Brain, 24(3), 207–213. https://doi.org/10.1037/pmu0000051 Carroll, J. B. (1978). How shall we study individual differences in cognitive abilities?– Methodological and theoretical perspectives. Intelligence, 2, 87–115. https://doi.org/10.1016/0160-2896(78)90002-8 Carroll, J. G. (1983). The use of musical verbal stimuli in teaching low-functioning autistic children (Publication No. 8404269) [Doctoral dissertation, University of Mississippi]. ProQuest Dissertations and Theses Global. Carson, A. D. (1998). Why has musical aptitude assessment fallen flat? And what can we do about it? Journal of Career Assessment, 6(3), 311–328. https://doi.org/10.1177/106907279800600303 Cary, S. (1981). Individualized music instruction–Traditional music instruction: Relationships of music achievement, music performance, music attitude, music aptitude, and reading classes of fifth grade students (Publication No. 8201812) [Doctoral dissertation, University of Oregon]. ProQuest Dissertations and Theses Global. Choi, E. (1996). The development and implementation of interactive multimedia instrumental discrimination skills training courseware for beginning clarinet students (Publication No. 9635496) [Doctoral dissertation, University of Michigan]. ProQuest Dissertations and Theses Global. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 296 Chuang, W. J. (1997). An investigation of the use of musical aptitude profile with Taiwanese students in grades four to twelve (Publication No. 9734114) [Doctoral dissertation, Michigan State University]. ProQuest Dissertations and Theses Global. Ciepluch, G. M. (1988). Sightreading achievement in instrumental music performance, learning gifts, and academic achievement: A correlation study (Publication No. 8810008) [Doctoral dissertation, University of Wisconsin–Madison]. ProQuest Dissertations and Theses Global. Ciorba, C. R. (2006). The creation of a model to predict jazz improvisation achievement (Publication No. 3243107) [Doctoral dissertation, University of Miami]. ProQuest Dissertations and Theses Global. Clark, B. J. (2005). The equity and effectiveness of policies and procedures instrumental music instructors deem essential to program development for beginning percussionists (Publication No. 3182240) [Doctoral dissertation, University of Illinois at Urbana– Champaign]. ProQuest Dissertations and Theses Global. Conkling, S. W. (1994). A comparison of the effects of learning sequence activities and vocal development exercises on the vocal music achievement of middle level students (Publication No. 9503122) [Doctoral dissertation, University of Rochester, Eastman School of Music]. ProQuest Dissertations and Theses Global. Cook, R. M. (2020). Addressing missing data in quantitative counseling research. Counseling Outcome Research and Evaluation, 1, 1–11. https://doi.org/10.1080/21501378.2019.1711037 Cooper, H. M. (1989). Integrating research: A guide for literature review. Sage. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 297 Crawford, L. A. (2016). Composing in groups: Creative processes of third and fifth grade students (Publication No. 10195571) [Doctoral dissertation, University of Southern California, Los Angeles]. ProQuest Dissertations and Theses Global. Creswell, J. W. (2012). Educational research: Planning, conducting, and evaluating quantitative and qualitative research (4th ed.). Pearson Education, Inc. Cribari, P. B. (2014). A comparison of aural and aural–visual modeling on the development of executive and performance skills of beginning recorder students (Publication No. 3662713) [Doctoral dissertation, Boston University]. ProQuest Dissertations and Theses Global. Culp, M. E. (2017). The relationship between phonological awareness and music aptitude. Journal of Research in Music Education, 65(3), 328–346. https://doi.org/10.1177/0022429417729655 Curtis, C. (1981). A comparative analysis of the musical aptitude of normal children and mildly handicapped children mainstreamed into regular classrooms (Publication No. 8121544) [Doctoral dissertation, Vanderbilt University]. ProQuest Dissertations and Theses Global. Davis, L. M. (1981). The effects of structured singing activities and self-evaluation practice on elementary band students’ instrumental music performance, melodic tonal imagery, selfevaluation and attitude (Publication No. 8128981) [Doctoral dissertation, The Ohio State University]. ProQuest Dissertations and Theses Global. DeCuir, J. T., & Dixson, A. D. (2004). “So when it comes out, they aren’t that surprised that it is there”: Using critical race theory as a tool of analysis of race and racism in education. Educational Researcher, 33(5), 26–31. https://doi.org/10.3102/0013189X033005026 STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 298 Degé, F., Patscheke, H., & Schwarzer, G. (2017). Associations between two measures of music aptitude: Are the IMMA and the AMMA significantly correlated in a sample of 9- to 13year old children? Musicae Scientiae, 21(4), 465–478. https://doi.org/10.1177/1029864916670205 Dell, C. E. (2003). Singing and tonal pattern instruction effects on beginning string students’ intonation skills (Publication No. 3084778) [Doctoral dissertation, University of South Carolina]. ProQuest Dissertations and Theses Global. Della Pietra, C. J. (1997). The effects of a three-phase constructivist instructional model for improvisation on high school students’ perception and reproduction of musical rhythm (Publication No. 9736258) [Doctoral dissertation, University of Washington]. ProQuest Dissertations and Theses Global. Deutsch, D. (Ed.) (1982). The psychology of music. Academic Press. DeYarman, R. (1972). An experimental analysis of the development of rhythmic and tonal capabilities of kindergarten and first grade children. Experimental Research in the Psychology of Music, Studies in the Psychology of Music, Volume 8. University of Iowa Press. DeYarman, R. M. (1975). An investigation of the stability of musical aptitude among primaryage children. In Edwin Gordon (Ed.), Experimental Research in the Psychology of Music: 10, 1–23. University of Iowa Press. Drennan, C. B. (1984). The relationship of musical aptitude, academic achievement and intelligence in merit (gifted) students of Murfreesboro city schools (Tennessee) (Publication No. 8529568) [Doctoral dissertation, Tennessee State University]. ProQuest Dissertations and Theses Global. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 299 Edmund, D. C. (2009). The effect of articulation study on stylistic expression in high school musicians’ jazz performance (Publication No. 3385921) [Doctoral dissertation, University of Florida]. ProQuest Dissertations and Theses Global. Etzel, M. (1979). The effect of training upon children’s ability in grades one through six to perform selected musical listening tasks (Publication No. 7915345) [Doctoral dissertation, University of Illinois at Urbana–Champaign]. ProQuest Dissertations and Theses Global. Evans, J. D. (1996). Straightforward statistics for the behavioral sciences. Thomson Brooks/Cole Publishing Co. Field, A. (2009). Discovering statistics using spss (3rd ed.). SAGE. Flohr, J. W. (1981). Short-term music instruction and young children's developmental music aptitude. Journal of Research in Music Education, 29(3), 219–223. https://doi.org/10.2307/3344995 Forsythe, R. (1984). The development and implementation of a computerized preschool measure of musical audiation (Publication No. 8425572). [Doctoral dissertation, Case Western Reserve University]. ProQuest Dissertations and Theses Global. Fosha, R. L. (1964). A study of the concurrent validity of the musical aptitude profile (Publication No. 6500455) [Doctoral dissertation, University of Iowa]. ProQuest Dissertations and Theses Global. Frierson-Campbell, C. (2001). The effects of audiation-based enrichment activities on secondyear wind and percussion instrumental music achievement (Publication No. 9965251) [Doctoral dissertation, University of Rochester, Eastman School of Music]. ProQuest Dissertations and Theses Global. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 300 Froseth, J. (1968). An investigation of the use of musical aptitude profile scores in the instruction of beginning students in instrumental music (Publication No. 6816800) [Doctoral dissertation, University of Iowa]. ProQuest Dissertations and Theses Global. Froseth, J. (1971). Using MAP scores in the instruction of beginning students in instrumental music. Journal of Research in Music Education, 19, 98–105. https://doi.org/10.2307/3344119 Fullen, D. L. (1993). An investigation of the validity of the advanced measures of music audiation with junior high and senior high school students (Publication No. 9316479) [Doctoral dissertation, Temple University]. ProQuest Dissertations and Theses Global. Gamble, D. K. (1989). A study of the effects of two types of tonal pattern instruction on the audiational and performance skills of first-year clarinet students (Publication No. 8912430) [Doctoral dissertation, Temple University]. ProQuest Dissertations and Theses Global. Gardner, H. (1983). Frames of mind: The theory of multiple intelligences. Basic Books. Geake, J. G. (1996). Why Mozart? Information processing abilities of gifted young musicians. Research Studies in Music Education, 7(1), 28–45. https://doi.org/10.1177/1321103X9600700103 Geake, J. G. (1999). An information processing account of audiational abilities. Research Studies in Music Education, 12, 10–23. https://doi.org/10.1177/1321103X9901200102 Geissel, L. S., Jr. (1985). An investigation of the comparative effectiveness of the musical aptitude profile, the intermediate measures of music audiation, and the primary measures STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 301 of music audiation with fourth grade students (Publication No. 8521082) [Doctoral dissertation, Temple University]. ProQuest Dissertations and Theses Global. Gerhardstein, R. C. (2001). Edwin E. Gordon: A biographical and historical account of an American music educator and researcher (Publication No. 3014435) [Doctoral dissertation, Temple University]. ProQuest Dissertations and Theses Global. Ghisletta, P., & Aichele, S. (2017). Quantitative methods in psychological aging research: A mini-review. Gerontology, 63(6), 529–537. https://doi.org/10.1159/000477582 Gillborn, D. (2006). Critical race theory and education: Racism and anti-racism in educational theory and praxis. Discourse: Studies in the Cultural Politics of Education, 27(1), 11–32. https://doi.org/10.1080/01596300500510229 Gordon, E. E. (1965). The musical aptitude profile: A new and unique musical aptitude test battery. Bulletin of the Council for Research in Music Education, 6, 12–16. https://www.jstor.org/stable/40316898 Gordon, E. E. (1967a). A comparison of the performance of culturally disadvantaged students with that of culturally heterogeneous students on the musical aptitude profile. Psychology in the Schools, 4(3), 260–268. Gordon, E. E. (1967b). A three-year longitudinal predictive validity study of the musical aptitude profile. Experimental research in the psychology of music, studies in the psychology of music, Volume 5. University of Iowa Press. Gordon, E. E. (1968). The contribution of each musical aptitude profile subtest to the overall validity of the battery. Bulletin of the Council for Research in Music Education, 12, 32– 36. https://www.jstor.org/stable/40316956 STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 302 Gordon, E. E. (1969). Intercorrelations among musical aptitude profile and Seashore measures of musical talents subtests. Journal of Research in Music Education, 17(3), 263–271. https://doi.org/10.2307/3343874 Gordon, E. E. (1970). Taking into account musical aptitude differences among beginning instrumental students. American Educational Research Journal, 7(1), 41–53. https://doi.org/10.3102/00028312007001041 Gordon, E. E. (1971). The psychology of music teaching. Prentice-Hall, Inc. Gordon, E. E. (1976). Tonal and rhythm patterns: An objective analysis. SUNY Press. Gordon, E. E. (1979a). Developmental music aptitude as measured by the primary measures of music audiation. Psychology of Music 7(1), 42–49. https://doi.org/10.1177/030573567971005 Gordon, E. E. (1979b). Primary measures of music audiation. GIA Publications. Gordon, E. E. (1980a). The assessment of music aptitudes of very young children. Gifted Child Quarterly, 24(3), 107–111. https://doi.org/10.1177/001698628002400303 Gordon, E. E. (1980b). Developmental music aptitudes among inner-city primary children. Bulletin of the Council for Research in Music Education, 63, 25–30. https://www.jstor.org/stable/40317605 Gordon, E. E. (1981). The manifestation of developmental music aptitude in the audiation of “same” and “different” as sound in music. GIA Publications. Gordon, E. E. (1982). Intermediate measures of music audiation: A music aptitude test for first, second, third, and fourth grade children. GIA Publications. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 303 Gordon, E. E. (1984a). A longitudinal predictive validity study of the intermediate measures of music audiation. Bulletin of the Council for Research in Music Education, 78, 1–23. https://www.jstor.org/stable/40317839 Gordon, E. E. (1984b). Manual for the instrument timbre preference test. GIA Publications. Gordon, E. E. (1986a). A factor analysis of the musical aptitude profile, the primary measures of music audiation, and the intermediate measures of music audiation. Bulletin of the Council for Research in Music Education, 87, 17–25. https://www.jstor.org/stable/40317975 Gordon, E. E. (1986b). Final results of a two-year longitudinal predictive validity study of the instrument timbre preference test and the musical aptitude profile. Bulletin of the Council for Research in Music Education, 89, 8–17. https://www.jstor.org/stable/40318138 Gordon, E. E. (1986c). Manual: Primary measures of music audiation and intermediate measures of music audiation. GIA Publications. Gordon, E. E. (1987). The nature, description, measurement, and evaluation of music aptitudes. GIA Publications. Gordon, E. E. (1989a). Audie: A game for understanding and analyzing your child’s music potential. GIA Publications. Gordon, E. E. (1989b). Manual for the advanced measures of music audiation. GIA Publications. Gordon, E. (1989c). Predictive validity studies of IMMA and ITPT. GIA Publications. Gordon, E. E. (1989d). A two-year longitudinal predictive validity study of the instrument timbre preference test and the intermediate measures of music audiation. GIA Publications. Gordon, E. E. (1990a). Jump right in: Rhythm register book 1. GIA Publications. Gordon, E. E. (1990b). Jump right in: Tonal register book 1. GIA Publications. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 304 Gordon, E. E. (1990c). A one-year longitudinal predictive validity study of the advanced measures of music audiation. GIA Publications. Gordon, E. E. (1991). Taking another look at scoring the advanced measures of music audiation: The German study. In The advanced measures of music audiation and the instrument timbre preference test: Three research studies (pp. 1–21). GIA Publications. Gordon, E. E. (1993). Learning sequences in music: Skill, content and patterns. GIA. Publications. Gordon, E. E. (1995). Manual: Musical aptitude profile. Chicago, IL: GIA Publications. Gordon, E. E. (1998). Introduction to research and the psychology of music. GIA Publications. Gordon, E. E. (1999). All about audiation and music aptitudes: Edwin E. Gordon discusses using audiation and music aptitudes as teaching tools to allow students to reach their full music potential. Music Educators Journal, 86(2), 41–44. https://doi.org/10.2307/3399589 Gordon, E. E. (2001a). Music aptitude and related tests: An introduction. GIA Publications. Gordon, E. E. (2001b). Preparatory audiation, audiation, and music learning theory: A handbook of a comprehensive music learning sequence. GIA Publications. Gordon, E. E. (2001c). A three-year study of the musical aptitude profile. GIA Publications. (First printing The University of Iowa Press, Iowa City, 1967). Gordon, E. E. (2002). Developmental and stabilized music aptitudes: Further evidence of the duality. GIA Publications. Gordon, E. E. (2004). Continuing studies in music aptitudes. GIA Publications. Gordon, E. E. (2005). Vectors in my research: Reflections on the development of music learning theory. In M. Runfola & C. C. Taggart (Eds.), The development and practical application of music learning theory (3–50). GIA Publications. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 305 Gordon, E. E. (2006). Nature, source, evaluation, and measurement of music aptitudes. Polskie Forum Psychologiczne, 11(2), 227–237. http://repozytorium.ukw.edu.pl/handle/item/903 Gordon, E. E. (2010). The crucial role of music aptitudes in music instruction. In T. S. Brophy (Ed.), The Practice of Assessment in Music Education: Frameworks, Models, and Designs: Proceedings of the 2009 Florida Symposium on Assessment in Music Education (pp. 211–215). GIA Publications. Gordon, E. E. (2011). Untying Gordian knots. GIA Publications. Gordon, E. E. (2012). Learning sequences in music: Skill, content, and patterns (2012 ed.). GIA Publications. Gordon, E. E. (2013). Music learning theory for newborn and young children. GIA Publications. Gordon, E. E. (2015). Space audiation. GIA Publications. Gouzouasis, P. (1990). An investigation of the comparative effects of two tonal pattern systems and two rhythm pattern systems for learning to play guitar (Publication No. 9100281) [Doctoral dissertation, Temple University]. ProQuest Dissertations and Theses Global. Graham, J. W., Olchowski, A. E., & Gilreath, T. D. (2007). How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevention Science, 8, 206–213. https://doi.org/10.1007/s11121-007-0070-9 Grashel, J. (2008). The measurement of musical aptitude in 20th century United States: A brief history. Bulletin of the Council for Research in Music Education, 176, 45–49. https://www.jstor.org/stable/40319432 Green, B. R. (2003). The comparative effects of computer-mediated interactive instruction and traditional instruction on music achievement in guitar performance (Publication No. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 306 NQ86051) [Doctoral dissertation, The University of British Columbia (Canada), Vancouver]. ProQuest Dissertations and Theses Global. Groeling, C. R. (1975). A comparison of two methods of teaching instrumental music to fourthgrade beginners (Publication No. 7529644) [Doctoral dissertation, Northwestern University]. ProQuest Dissertations and Theses Global. Gromko, J. E., & Russell, C. (2002). Relationships among young children’s aural perception, listening condition, and accurate reading of graphic listening maps. Journal of Research in Music Education, 50(4), 333–342. https://doi.org/10.2307/3345359 Gromko, J. E., & Walters, K. (1999). The development of musical pattern perception in schoolaged children. Research Studies in Music Education, 12, 24–29. https://doi.org/10.1177/1321103X9901200103 Grutzmacher, P. A. (1985). The effect of tonal pattern training on the aural perception, reading recognition and melodic sight reading achievement of first year instrumental music students (Publication No. 8514172) [Doctoral dissertation, Kent State University]. ProQuest Dissertations and Theses Global. Guderian, L. V. (2008). Effects of applied music composition and improvisation assignments on sight-reading ability, learning in music theory and quality in soprano recorder playing (Publication No. 3331120) [Doctoral dissertation, Northwestern University]. ProQuest Dissertations and Theses Global. Guerrini, S. C. (2002). The acquisition and assessment of the developing singing voice among elementary students (Publication No. 3040318) [Doctoral dissertation, Temple University]. ProQuest Dissertations and Theses Global. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 307 Guerrini, S. C. (2004). The relationship of vocal accuracy, gender, and music aptitude among elementary students. Visions of Research in Music Education, 4, 1–14. http://wwwusr.rider.edu/~vrme/v4n1/visions/Guerrini%20The%20Relationship%20of%20Vocal%20 Accuracy.pdf Guilbault, D. M. (2004). The effect of harmonic accompaniment on the tonal achievement and tonal improvisations of children in kindergarten and first grade. Journal of Research in Music Education, 52(1), 64–76. https://doi.org/10.2307/3345525 Hansen, D. A. (1991). The effect of prerequisite skill mastery on higher order skill attainment and motivation in music learning (Publication No. 9207597) [Doctoral dissertation, University of Missouri–Kansas City]. ProQuest Dissertations and Theses Global. Haroutounian, J. (2002). Kindling the spark: Recognizing and developing musical talent. Oxford University Press. Harrington, C. J. (1969). An investigation of the primary level musical aptitude profile for use with second and third grade students. Journal of Research in Music Education, 17(4), 359–368. https://doi.org/10.2307/3344164 Haston, W. A. (2004). Comparison of a visual and an aural approach to beginning wind instrument instruction (Publication No. 3132535) [Doctoral dissertation, Northwestern University]. ProQuest Dissertations and Theses Global. Hasty, J. G. J. (1992). The influence of selected music teaching strategies upon aesthetic responses to phrasing, balance, and style among middle school band students (Publication No. 9316347) [Doctoral dissertation, University of Georgia]. ProQuest Dissertations and Theses Global. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 308 Hatcher, L. (2013). Advanced statistics in research: Reading, understanding, and writing up data analysis results. Shadow Finch Media. Heathers, G. (1977). A working definition of individualized instruction. Educational Leadership, 34(5), 342–345. Henry, W. H. (1995). The effects of pattern instruction, repeated composing opportunities, and musical aptitudes on the compositional processes and products of fourth-grade students (Publication No. 9537223) [Doctoral dissertation, Michigan State University]. ProQuest Dissertations and Theses Global. Henry, W. H. (2002). The effects of pattern instruction, repeated composing opportunities, and musical aptitude on the compositional process and products of fourth-grade student. Contributions to Music Education, 29(1), 9–28. https://www.jstor.org/stable/24126972 Hess, J. (2015). Decolonizing music education: Moving beyond tokenism. International Journal of Music Education, 33(3), 336–347. https://doi.org/10.1177/0255761415581283 Heymans, M. W., & Eekhout, I. (2019). Pooling means and standard deviations in spss. Applied missing data analysis with spss and (r)studio (First draft). Amsterdam. https://bookdown.org/mwheymans/bookmi/ Hobbs, C. (1985). A comparison of the music aptitude, scholastic aptitude, and academic achievement of young children. Psychology of Music, 13(2), 93–98. https://doi.org/10.1177/0305735685132003 Holahan, J. M., & Thomson, S. W. (1981). An investigation of the suitability of the primary measures of music audiation for use in England. Psychology of Music, 9(2), 63–68. https://doi.org/10.1177/030573568192006 STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 309 Hood, S. (1998). Culturally responsive performance-based assessment: Conceptual and psychometric considerations. The Journal of Negro Education, 67(3), 187–196. https://doi.org/10.2307/2668188 Hornbach, C. M., & Taggart. C. C. (2005). The relationship between developmental tonal aptitude and singing achievement among kindergarten, first-, second-, and third-grade students. Journal of Research in Music Education, 53(4), 322–331. https://doi.org/10.1177/002242940505300404 Horton, N. J., & Lipsitz, S. R. (2001). Multiple imputation in practice: Comparison of software packages for regression models with missing variables. The American Statistician, 55(3), 244–254. https://www.jstor.org/stable/2685809 Huck, S. W. (2012). Reading statistics and research (6th ed.). Allyn & Bacon. Hufstader, R. A. (1974). Predicting success in beginning instrumental music through use of selected tests. Journal of Research in Music Education, 22(1), 52–57. https://doi.org/10.2307/3344618 IBM Corp. (2019a). IBM SPSS missing values 26. Retrieved December 24, 2020, from ftp://public.dhe.ibm.com/software/analytics/spss/documentation/statistics/26.0/en/client/ Manuals/IBM_SPSS_Missing_Values.pdf IBM Corp. (2019b). IBM SPSS statistics for Macintosh, Version 26.0. IBM Corp. Jaffurs, S. E. (2000). The relationship between singing achievement and tonal music aptitude. (Publication No. 1399634) [Master’s thesis, Michigan State University]. ProQuest Dissertations and Theses Global. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 310 Jarvis, W. C. (1981). The effectiveness of verbalization upon the recognition and performance of instrumental music notation (Publication No. 8120827) [Doctoral dissertation, Rutgers, The State University of New Jersey]. ProQuest Dissertations and Theses Global. Jin, H. L., & Huber, J. Jr. (2011). Multiple imputation with large proportions of missing data: How much is too much? United Kingdom Stata Users’ Group Meetings 2001 (No. 23). Stata Users Group. http://repec.org/usug2011/UK11_Lee.pptx Johnson, D. A. (2000). The development of music aptitude and effects on scholastic achievement of 8 to 12 year olds (Publication No. 9983062) [Doctoral dissertation, University of Louisville]. ProQuest Dissertations and Theses Global. Josuweit, D. (1991). The effects of an audiation-based instrumental music curriculum upon beginning band students’ achievement in music (Publication No. 9207869) [Doctoral dissertation, Temple University]. ProQuest Dissertations and Theses Global. Karas, J. B. (2005). The effect of aural and improvisatory instruction on fifth-grade band students’ sight reading ability (Publication No. 3199697) [Doctoral dissertation, University of Nebraska, Lincoln]. ProQuest Dissertations and Theses Global. Karma, K. (1982). Validating tests of musical aptitude. Psychology of Music, 10(1), 33–36. https://doi.org/10.1177/0305735682101004 Karma, K. (1984). Musical aptitude as the ability to structure acoustic material. International Journal of Music Education, 3(1), 19–30. https://doi.org/10.1177/025576148400300104 Karma, K. (1994). Auditory and visual temporal structuring: How important is sound to musical thinking? Psychology of Music, 22, 20–30. https://doi.org/10.1177/0305735694221002 Karma, K. (2007). Musical aptitude definition and measure validation: Ecological validity can endanger the construct of musical aptitude tests. Psychomusicology: A Journal of STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 311 Research in Music Cognition, 19(2), 79–90. http://dx.doi.org.gate.lib.buffalo.edu/10.1037/h0094033 Kendall, M. J. (1986). The effects of visual interventions on the development of aural and instrumental performance skills in beginning fifth-grade instrumental students: A comparison of two instruction approaches (reading, kinesthetic, musical technique). (Publication No. 8612553) [Doctoral dissertation, University of Michigan]. ProQuest Dissertations and Theses Global. Kim, K. H., & Zabelina, D. (2015). Cultural bias in assessment: Can creativity assessment help? International Journal of Critical Pedagogy, 6(2), 129–147. http://libjournal.uncg.edu/ijcp/article/view/301/856 Kimble, E. P. (1983). The effect of various factors on the ability of children to sing an added part (Publication No. 8326407) [Doctoral dissertation, University of Georgia]. ProQuest Dissertations and Theses Global. Kleinke, K. (2018). Multiple imputation by predictive mean matching when sample size is small. Methodology, 14(1), 3–15. https://doi.org/10.1027/1614-2241/a000141 Klinedinst, R. E. (1989). The ability of selected factors to predict performance achievement and retention of fifth-grade instrumental music students (Publication No. 9006131) [Doctoral dissertation, Kent State University]. ProQuest Dissertations and Theses Global. Klinedinst, R. E. (1991). Predicting performance achievement and retention of fifth-grade instrumental students. Journal of Research in Music Education, 39(3), 225–238. https://doi.org/10.2307/3344722 STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 312 Kluth, B. L. (1986). A procedure to teach rhythm reading: Development, implementation, and effectiveness in urban junior high school music classes (Publication No. 8617078) [Doctoral dissertation Kent State University]. ProQuest Dissertations and Theses Global. Knoester, M., and Au, W. (2017). Standardized testing and school segregation: Like tinder for fire? Race Ethnicity and Education, 20(1), 1–14. https://doi.org/10.1080/13613324.2015.1121474 Koelsch, N., Estrin, E., & Farr, B. (1995). Guide to developing equitable performance assessments. Office of Educational Research and Improvement. https://files.eric.ed.gov/fulltext/ED397125.pdf Kohn, A. (2000). Standardized testing and its victims. Educational Week, 20(4), 60–64. Kołodziejski, M. (2019). Relationship between stabilised musical aptitude and harmonic and rhythm improvisation readiness in adults in transversal research. Uniwersytet Humanistyczno-przyrodniczy Im. Jana Długosza W Częstochowie (Poland), XIV, 177– 197. http://dx.doi.org/10.16926/em.2019.14.08 Kopiez, R., & In Lee, J. (2006). Towards a dynamic model of skills involved in sight reading music. Music education research, 8(1), 97–120. https://doi.org/10.1080/14613800600570785 Kopiez, R. & Lee, J. (2008). Towards a general model of skills involved in sight reading music. Music Education Research, 10(1), 41–62. https://doi.org/10.1080/14613800701871363 Kratus, J. (1994). Relationships among children’s music audiation and their compositional processes and products. Journal of Research in Music Education 42(2), 115–130. https://doi.org/10.2307/3345496 STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 313 Kuhlman, K. (2005). Musical aptitude versus academic ability as a predictor of beginning instrumental music achievement and retention: Research and implications. Update: Applications of Research in Music Education, 24(1), 34–43). https://doi.org/10.1177/87551233050240010105 Landerman, L. R., Land, K. C., & Pieper, C. F. (1997). An empirical evaluation of the predictive mean matching method for imputing missing values. Sociological Methods & Research, 26(1), 3–33. https://doi.org/10.1177/0049124197026001001 Law, L. N. C., & Zentner, M. (2012). Assessing musical abilities objectively: Construction and validation of the profile of music perception skills. PLoS One, 7(12), 1–15. https://doi.org/10.1371/journal.pone.0052508 Lee, E. (2007). A study of the effect of computer assisted instruction, previous music experience, and time on the performance ability of beginning instrumental music students (Publication No. 3284028) [Doctoral dissertation, The University of Nebraska–Lincoln]. ProQuest Dissertations and Theses Global. Leech, N. L., Barrett, K. C., & Morgan, G. A. (2015). IBM SPSS for intermediate statistics: Use and interpretation (5th ed.). Routledge. Levinowitz, L. M., & Scheetz, J. (1998). The effects of group and individual echoing of rhythm patterns on third-grade students’ rhythmic skills. Update: Applications of Research in Music Education, 16(2), 8–11. https://doi.org/10.1177/875512339801600203 Li, P., Stuart, E. A., & Allison, D. B. (2015). Multiple imputation: A flexible tool for handling missing data. Journal of the American Medical Association, 314(18), 1966–1967. https://doi.org/10.1001/jama.2015.15281 STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 314 Linklater, R. F. (1994). A comprehensive investigation of the effects of audio and video tape models on the musical development of beginning clarinet students (Publication No. 9500987) [Doctoral dissertation, University of Michigan]. ProQuest Dissertations and Theses Global. Liperote, K. A. (2004). A study of audiation-based instruction, music aptitude, and music achievement of elementary wind and percussion students (Publication No. 3123215) [Doctoral dissertation, University of Rochester, Eastman School of Music]. ProQuest Dissertations and Theses Global. Little, R. J. (1988a). Missing-data adjustments in large surveys. Journal of Business & Economic Statistics, 6(3), 287–296. 10.1080/07350015.1988.10509663 Little, R. J. (1988b). A test of missing completely at random for multivariate data with missing values. Journal of the American statistical Association, 83(404), 1198–1202. https://doi.org/10.1080/01621459.1988.10478722 Lundin, R. (1967). An objective psychology of music (2nd ed.). Ronald Press Co. Madley-Dowd, P., Hughes, R., Tilling, K., & Heron, J. (2019). The proportion of missing data should not be used to guide decisions on multiple imputation. Journal of Clinical Epidemiology, 110, 63–73. https://doi.org/10.1016/j.jclinepi.2019.02.016 Mang, E. (2013, December). Musicality profile of Hong Kong children. In 2013 International Conference on the Modern Development of Humanities and Social Science, 331–333. Atlantis Press. https://doi.org/10.2991/mdhss-13.2013.87 Mawbey, W. E. (1973). Wastage from instrumental classes in schools. Psychology of Music, 1, 33–43. https://doi-org.gate.lib.buffalo.edu/10.1177/030573567311007 STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 315 McCarthy, J. (1974). The effect of individualized instruction on the performance achievement of beginning instrumentalists. Bulletin of the Council for Research in Music Education, 38, 1–16. https://www.jstor.org/stable/40317313 McDonald, K. J. (2010). The effect of vocal jazz aural skill instruction on student sight singing achievement (Publication No. 3438011) [Doctoral dissertation, University of Hartford]. ProQuest Dissertations and Theses Global. McDowell, R. (1974). The development and implementation of a rhythmic ability test designed for 4-year-old preschool children (Publication No. 7422023) [Doctoral dissertation, University of North Carolina at Greensboro]. ProQuest Dissertations and Theses Global. McKnight, P. E., McKnight, K. M., Sidani, S., & Figueredo, A. J. (2007). Missing data: A gentle introduction. The Guilford Press. McPherson, G. E. (1995). ‘Honing the craft’: Improving the way we teach the musically gifted and talented. In Honing the craft: Improving the quality of music education; Conference proceedings of the Australian society for music education, 10th national conference (p. 169). Artemis Publishing. Menard, E. (2009). An investigation of creative potential in high school musicians: Recognizing, promoting, and assessing creative ability through music composition (Publication No. 3451495) [Doctoral dissertation, Louisiana State University and Agricultural & Mechanical College]. ProQuest Dissertations and Theses Global. Miceli, J. S. (1998). An investigation of an audiation-based high school general music curriculum and its relationship to music aptitude, music achievement, and student perception of learning (Publication No. 9825698) [Doctoral dissertation, University of Rochester, Eastman School of Music]. ProQuest Dissertations and Theses Global. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 316 Milford, G. F. (2002). Effect of three different pulse stimulus modes on the rhythm reading achievement of beginning instrumentalists (Publication No. 3057395) [Doctoral dissertation, Kent State University]. ProQuest Dissertations and Theses Global. Mitchum, J. P. (1969). The Wing ‘standardized tests of musical intelligence’: An investigation of predictability with selected seventh-grade beginning-band students (Publication No. 7008565) [Doctoral dissertation, Florida State University]. ProQuest Dissertations and Theses Global. Moll, L. C. (1992). Bilingual classroom studies and community analysis: Some recent trends. Educational Researcher, 21(2), 20–24. https://doi.org/10.3102/0013189X021002020 Moore, J. L. (1987). An experiment with rhythm and movement upon developmental music aptitude. Update: The Applications of Research in Music Education, 6(1), 7–10. Moore, J. L. (1990). Toward a theory of developmental music aptitude. Research Perspectives in Music Education: A Bulletin of the Florida Music Educators Association, 1(1), 19– 23. https://bit.ly/2LKKdHq Morgan, M. (1995). Effects of Gordon’s model for music education on the rhythmic aptitude of second-grade students (Publication No. 9616875) [Doctoral dissertation, The State University of New York at Albany]. ProQuest Dissertations and Theses Global. Mota, G. (1997). Detecting young children’s musical aptitude: A comparison between standardized measures of music aptitude and ecologically valid musical performances. Bulletin of the Council for Research in Music Education, 133, 89–94. https://www.jstor.org/stable/40318845 Moustakas, C. (1994). Phenomenological research methods. Sage Publications, Inc. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 317 Müllensiefen, D., Gingras, B., Musil, J., & Stewart, L. (2014). The musicality of non-musicians: An index for assessing musical sophistication in the general population. PLoS ONE 9(2): e89642, 1–23. https://doi.org/10.1371/journal.pone.0089642 Multiculturalism (2016, August 12). Stanford Encyclopedia of Philosophy. Retrieved May 18, 2020, from https://plato.stanford.edu/entries/multiculturalism/ National Association for Music Education (n.d.). 2014 music standards (PK–8 general music). Retrieved February 18, 2021, from https://nafme.org/wp-content/uploads/2014/11/2014Music-Standards-PK-8-Strand.pdf National Association for Music Education (2021). Early childhood music education. Retrieved February 18, 2021, from https://nafme.org/about/position-statements/early-childhoodmusic-education/ National Center for Education Statistics (n.d.). School directory information.https://nces.ed.gov/ccd/schoolsearch/school_detail.asp?Search=1&State=42 &SchoolPageNum=67&ID=421131005038 Norton, D. (1980). Interrelationships among music aptitude, IQ, and auditory conservation. Journal of Research in Music Education, 28(4), 207–217. https://doi.org/10.2307/3345029 O’Leary, J. E. (2010). The effects of motor movement on elementary band students’ music and movement achievement (Publication No. 3405999) [Doctoral dissertation, Boston University]. ProQuest Dissertations and Theses Global. Ortner, J. M. (1990). The effectiveness of a computer-assisted instruction program in rhythm for secondary school instrumental music students (Publication No. 9115133) [Doctoral STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 318 dissertation, The State University of New York at Buffalo]. ProQuest Dissertations and Theses Global. Palmer, M. H. (1974). The relative effectiveness of the Richards and the Gordon approaches to rhythm reading for fourth grade children (Publication No. 7511879) [Doctoral dissertation, University of Illinois at Urbana–Champaign]. ProQuest Dissertations and Theses Global. Pan, Q., & Wei, R. (2016). Fraction of missing information (γ) at different missing data fractions in the 2012 NAMCS physician workflow mail survey. Applied Mathematics, 7(10), 1057–1067. https://doi.org/10.4236/am.2016.710093 Parks, J. K. E. (2005). The effect of a program of portable electronic piano keyboard experience on the acquisition of sight-singing skill in the novice high school chorus (Publication No. 3201961) [Doctoral dissertation, University of Maryland, College Park]. ProQuest Dissertations and Theses Global. Pedhazur, E.J., & Schmelkin, L. P. (1991). Measurement, design, and analysis: An integrated approach. Lawrence Erlbaum Associates. Pereira, A. I., Rodrigues, H., & Rutkowski, J. (2017). The relationship between children’s use of singing voice, singing accuracy, and self-perception on singing with text and neutral syllable. In Context Matters, The 6th International Symposium on Assessment in Music Education [Symposium], Birmingham, United Kingdom. http://reg.conferences.dce.ufl.edu/docs/ISAME/2017/isameprogramme.pdf Peterson, J. J. (1983). The Iowa testing programs: The first fifty years. University of Iowa Press. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 319 Phillips, K. H., & Aitchison, R. E. (1997). The relationship of singing accuracy to pitch discrimination and tonal aptitude among third-grade students. Contributions to Music Education, 24(1), 7–22. https://www.jstor.org/stable/24126943 Phillips, K. H., Aitchison, R. E., & Nompula, Y. P. (2002). The relationship of music aptitude to singing achievement among fifth-grade students. Contributions to Music Education, 29(1), 47–58. https://www.jstor.org/stable/24126974 Pollock, M. (2004). Colormute: Race talk dilemmas in an American school. Princeton University Press. https://www-jstor-org.gate.lib.buffalo.edu/stable/j.ctt7rjh1 Pruitt, J. S. (1966). A study of withdrawals from the beginning instrumental music programs of selected schools in the school district of Greenville county, South Carolina (Publication No. 6609485) [Doctoral dissertation, New York University]. ProQuest Dissertations and Theses Global. Pursell, A. F. (2005). The effectiveness of iconic-based rhythmic instruction on middle school instrumentalists’ ability to read rhythms at sight (Publication No. 3194875) [Doctoral dissertation, Ball State University]. ProQuest Dissertations and Theses Global. Radocy, R., & Boyle, J. D. (1979). Psychological foundations of musical behavior. Charles C Thomas. Reese, J. A., & Shouldice, H. N. (2019). Assessment in the music learning theory-based classroom. In T. Brophy (Ed.), The Oxford Handbook of Assessment Policy and Practice in Music Education, Volume 2 (pp. 477–501). Oxford University Press. Reifinger, J. L. (2018). The relationship of pitch sight-singing skills with tonal discrimination, language reading skills, and academic ability in children. Journal of Research in Music Education, 66(1), 71–91. https://doi.org/10.1177/0022429418756029 STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 320 Reynolds, A. M., & Hyun, K. (1994). Understanding music aptitude: Teachers’ interpretations. Research Studies in Music Education, 23(1), 18–31. https://doi.org/10.1177/1321103X040230010201 Rowlyk, W. T. (2008). Effects of improvisation instruction on nonimprovisation music achievement of seventh and eighth grade instrumental music students (Publication No. 3300374) [Doctoral dissertation, Temple University]. ProQuest Dissertations and Theses Global. Runfola, M. (2016). Development of MAP and ITML: Is music learning theory an unexpected outcome? In T. S. Brophy, J. Marlatt, & G. K. Ritcher (Eds.), Connecting Practice, Measurement, and Evaluation: Selected Papers from the Fifth International Symposium on Assessment in Music Education (pp. 357–374). GIA Publications. Runfola, M., & Etopio, E. (2010). The nature of performance-based criterion measures in early childhood music education research, and related issues. In T. S. Brophy (Ed.), The Practice of Assessment in Music Education: Frameworks, Models, and Designs. Proceedings of the 2009 Florida Symposium on Assessment in Music Education (pp. 395411). GIA Publications. Russell, J. A. (2018). Statistics in music education research. Oxford University Press. Ruthsatz, J. M. (2000). Predicting expert performance within the musical domain: A test of summation theory (Publication No. 9981862) [Doctoral dissertation, Case Western Reserve University]. ProQuest Dissertations and Theses Global. Rutkowski, J. (1986). The effect of restricted song range on kindergarten children’s use of singing voice and developmental music aptitude (Publication No. 8619357) [Doctoral STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 321 dissertation, The State University of New York at Buffalo]. ProQuest Dissertations and Theses Global. Rutkowski, J. (1996). The effectiveness of individual/small-group singing activities on kindergartners’ use of singing voice and developmental music aptitude. Journal of Research in Music Education, 44(4), 353–368. https://doi.org/10.2307/3345447 Rutkowski, J. (2015). The relationship between children’s use of singing voice and singing accuracy. Music Perception: An Interdisciplinary Journey, 32(3), 283–292. https://doi.org/10.1525/mp.2015.32.3.283 Rutkowski, J., & Miller, M. S. (2003a). The effectiveness of fequency [sic] of instruction and individual/small-group singing activities on first graders’ use of singing voice and developmental music aptitude. Contributions to Music Education, 30(1), 23–38. https://www.jstor.org/stable/24127025 Rutkowski, J., & Miller, M. S. (2003b). The effect of teacher feedback and modeling on first graders’ use of singing voice and developmental music aptitude. Bulletin of the Council for Research in Music Education, 156, 1–10. https://www.jstor.org/stable/40319169 Salvador, K. (2011). Individualizing elementary general music instruction: Case studies of assessment and differentiation (Publication No. 3482549) [Doctoral dissertation, Michigan State University]. ProQuest Dissertations and Theses Global. Saunders, T. C., & Holahan, J. M. (1993). Computerized response procedure to assess young student reaction times of judgments of sameness and difference among paired tonal patterns. Bulletin of the Council for Research in Music Education, 115, 31–48. https://www.jstor.org/stable/40318746 STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 322 Schafer, J. L. (1999). Multiple imputation: A primer. Statistical Methods in Medical Research, 8, 3–15. https://doi.org/10.1177/096228029900800102 Schenker, N., & Taylor, J. M. G. (1996). Partially parametric techniques for multiple imputation. Computational Statistics & Data Analysis, 22, 425–446. https://doi.org/10.1016/01679473(95)00057-7 Schleuter, S. L. (1978). Effects of certain lateral dominance traits, music aptitude, and sex differences with instrumental music achievement. Journal of Research in Music Education, 26(1), 22–31. https://doi.org/10.2307/3344786 Schleuter, S. L. (1984). A sound approach to teaching instrumentalists: An application of content and learning sequences. Kent State University Press. Schleuter, S. L., & DeYarman, R. (1977). Musical aptitude stability among primary school children. Bulletin of the Council for Research in Music Education, 51, 14–22. https://www.jstor.org/stable/40317458 Schoenoff, A. (1973). An investigation of the comparability of American and German norms for the musical aptitude profile (Publication No. 7313591) [Doctoral dissertation, University of Iowa]. ProQuest Dissertations and Theses Global. Schoonover, R. J. (1974). A study of the construct validity of selected musical aptitude tests using the multitrait-multimethod matrix procedure (Publication No. 7428738) [Doctoral dissertation, Northwestern University]. ProQuest Dissertations and Theses Global. Seashore, C. E. (1919). The psychology of musical talent. Silver Burdett. Sell, V.H. (1976). The musical aptitude of Finnish students: An investigative study in comparative music education (Publication No. 628174) [Doctoral dissertation, University of Wisconsin–Madison]. ProQuest Dissertations and Theses Global. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 323 Sergeant, D., & Thatcher, G. (1974). Intelligence, social status and musical abilities. Psychology of Music, 2(2), 32–57. https://doi.org/10.1177/030573567422005 Shuter, R. (1968). The psychology of musical ability. Methuen & Co. Shuter-Dyson, R. (1999). Musical ability. In The Psychology of Music (2nd ed., pp. 627–651). Academic Press. https://doi.org/10.1016/B978-012213564-4/50017-2 Simmons, J. C. H. (1981). An investigation of relationships among primary-level student performance on selected measures of music aptitude, scholastic aptitude, and academic achievement (Publication No. 8131595) [Doctoral dissertation, Peabody College for Teachers of Vanderbilt University]. ProQuest Dissertations and Theses Global. Smith, J. P. (2004). Music compositions of upper elementary students created under various conditions of structure (Publication No. 3132610) [Doctoral dissertation, Northwestern University]. ProQuest Dissertations and Theses Global. Smith, N. (2006). The effect of learning and playing songs by ear on the performance of middle school band students (Publication No. 3255935) [Doctoral dissertation, University of Hartford]. ProQuest Dissertations and Theses Global. Soley-Bori, M. (2013). Dealing with missing data: Key assumptions and methods for applied analysis. Technical Report No. 4. http://www.bu.edu/sph/files/2014/05/Marina-techreport.pdf Stamou, L., Schmidt, C. P., & Humphreys, J. T. (2010). Standardization of the Gordon primary measures of music audiation in Greece. Journal of Research in Music Education, 58(1), 75–89. https://doi.org/10.1177/0022429409360574 Stangroom, J. (2021). Effect size calculator for t-test. Social Science Statistics. https://www.socscistatistics.com/effectsize/default3.aspx STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 324 Stanton, H. M. (1935). Measurement of musical talent: the Eastman experiment. In C. E. Seashore (Ed.), University of Iowa Studies in the Psychology of Music. University of Iowa Press. Stanton, H. M., & Koerth, W. (1933). Musical capacity measures of children repeated after musical training. University of Iowa Studies: Series of Aims & Progress of Research, 42, New Ser., 259, 48. Stevens, D. O. (1987). The construction and validation of a test of musical aptitude for young children (Publication No. 8715880) [Doctoral dissertation, University of South Dakota, Vermillion]. ProQuest Dissertations and Theses Global. Stoltzfus, J. (2005). The effects of audiation-based composition on the music achievement of elementary wind and percussion students (Publication No. 3169610) [Doctoral dissertation, University of Rochester, Eastman School of Music]. ProQuest Dissertations and Theses Global. Stringham, D. (2010). Improvisation and composition in a high school instrumental music curriculum (Publication No. 3445843) [Doctoral dissertation, University of Rochester, Eastman School of Music]. ProQuest Dissertations and Theses Global. Swaminathan, S., Schellenberg, E. G., and Khalil, S. (2017). Revisiting the association between music lessons and intelligence: Training effects or music aptitude? Intelligence, 62, 119– 124. https://doi.org/10.1016/j.intell.2017.03.005 Taggart, C. C. (1989). The measurement and evaluation of music aptitudes and achievement. In D. L. Walters & C. C. Taggart (Eds.), Readings in Music Learning Theory (pp. 45–54). GIA Publications. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 325 Talley, K. E. (2005). An investigation of the frequency, methods, objectives and applications of assessment in Michigan elementary general music classrooms (Publication No. 1428983) [Master’s thesis, Michigan State University]. ProQuest Dissertations and Theses Global. https://doi.org/10.3102/0091732X022001195 Van Ginkel, Linting, Rippe, & van der Voort (2020). Rebutting existing misconceptions about multiple imputation as a method for handling missing data. Journal of Personality Assessment, 102(3), 297–308. https://doi.org/10.1080/00223891.2018.1530680 Von Hippel, P. T. (2016). New confidence intervals and bias comparisons show that maximum likelihood can beat multiple imputation in small samples. Structural Equation Modeling: A Multidisciplinary Journal, 23(3), 422–437. https://arxiv.org/pdf/1307.5875.pdf Wallentin, M., Nielsen, A. H., Friis-Olivarius, M., Vuust, C., & Vuust, P. (2010). The musical ear test, a new reliable test for measuring musical competence. Learning and Individual Differences, 20(3), 188–196. https://doi.org/10.1016/j.lindif.2010.02.004 Walters, D. L. (1991). Edwin Gordon’s music aptitude work. The Quarterly, 2(1–2), 64–72. http://www-usr.rider.edu/~vrme/v16n1/volume2/visions/spring8.pdf Walters, D. L. (1992). Sequencing for efficient learning. In R. Colwell (Ed.), Handbook of research on music teaching and learning (pp. 535–545). Schirmer Books. Webb, M. N. A. (1984). An investigation of the relationship of musical aptitude and intelligence of students at the third grade level (Publication No. 8509188) [Doctoral dissertation, The University of North Carolina at Greensboro]. ProQuest Dissertations and Theses Global. Westervelt, T. G. (2001). An investigation of harmonic and improvisation readiness among upper elementary-age school children (Publication No. 3031569) [Doctoral dissertation, Temple University]. ProQuest Dissertations and Theses Global. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 326 Wing, H. D. (1939/1961). Standardised Tests of Musical Intelligence. The Mere, England: National Foundation for Educational Research. Wing, H. D. (1962). A revision of the “Wing musical aptitude test”. Journal of Research in Music Education, 10(1), 39–46. https://doi.org/10.2307/3343909 Wolf, A., & Kopiez, R. (2018). Development and validation of the musical ear training assessment (META). Journal of Research in Music Education, 66(1), 53–70. https://doiorg.gate.lib.buffalo.edu/10.1177/0022429418754845 Wöllner, C., Halfpenny, E., Ho, S., & Kurosawa, K. (2003). The effects of distracted inner hearing on sight-reading. Psychology of Music, 31(4), 377–389. https://doi.org/10.1177/03057356030314003 Yosso, T. J. (2005). Whose culture has capital? A critical race theory discussion of community cultural wealth. Race, Ethnicity and Education, 8(1), 69–91. https://doi.org/10.1080/1361332052000341006 Young, R., & Johnson, D. R. (2015). Handling missing values in longitudinal panel data with multiple imputation. Journal of Marriage and Family, 77(1), 277–294. doi:10.1111/jomf.12144 Young, W. T. (1971). The role of musical aptitude, intelligence, and academic achievement in predicting the musical attainment of elementary instrumental music students. Journal of Research in Music Education, 19(4), 385–398. https://doi.org/10.2307/3344291 Young, W. T. (1973). The Bentley “measures of musical abilities”: A congruent validity report. Journal of Research in Music Education, 21(1), 74–79. https://doi.org/10.2307/3343982 Young, W. T. (1976). A longitudinal comparison of four music achievement and music aptitude tests. Journal of Research in Music Education, 24(3), 97-109. STABILIZED MUSIC APTITUDE: ONSET AND TRANSITION 327 Zentner, M., & Gingras, B. (2019). The assessment of musical ability and its determinants. In P.J. Rentfrow & D. J. Levitin (Eds.), Foundations in music psychology: Theory and research (pp. 641–684). Massachusetts Institute of Technology. Zhang, Z. (2016). Missing data imputation: Focusing on single imputation. Annals of Translational Medicine, 4(1), 1–8. https://doi.org/10.3978/j.issn.2305-5839.2015.12.38 Zimmerman, M. P. (1986). Music development in middle childhood: A summary of selected research studies. Bulletin of the Council for Research in Music Education, 86, 18–35. https://www.jstor.org/stable/40317966 ProQuest Number: 28419252 INFORMATION TO ALL USERS The quality and completeness of this reproduction is dependent on the quality and completeness of the copy made available to ProQuest. Distributed by ProQuest LLC ( 2021 ). Copyright of the Dissertation is held by the Author unless otherwise noted. This work may be used in accordance with the terms of the Creative Commons license or other rights statement, as indicated in the copyright statement or in the metadata associated with this work. Unless otherwise specified in the copyright statement or the metadata, all rights are reserved by the copyright holder. This work is protected against unauthorized copying under Title 17, United States Code and other applicable copyright laws. Microform Edition where available © ProQuest LLC. No reproduction or digitization of the Microform Edition is authorized without permission of ProQuest LLC. ProQuest LLC 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, MI 48106 - 1346 USA