1 Distinguishing phenotypes of childhood wheeze and cough using latent class analysis AUTHORS Ben D. Spycher Michael Silverman Adrian M. Brooke Christoph E. Minder Claudia E. Kuehni SUPPLEMANTARY MATERIAL 2 Methods Pulmonary function Spirometry was carried out with an electronic spirometer, Pneumoscreen II/1 Erich Jaeger, Germany) with the child seated and with the nose clipped. The best of three repeatable forced expiratory manoeuvres (within 100 ml) was recorded after avoidance of short acting ß-agonist bronchodilator for 12 hours in children on regular treatment. Sex- and height-standardized z scores [1] of forced expiratory volume in 0.5 s (FEV0.5) were used for analysis in preference to FEV1 because, in the young age group studied, many children reached their forced vital capacity within one second. Measurement of bronchial reactivity Children who were wheezing on examination were excluded from the bronchial challenge test. The change in transcutaneous oxygen tension (tc-PO2) using a transcutaneous oxygen tension probe (Transoxode, Drager, Hemel Hempstead, UK) was employed as an effortindependent, indirect measure of the response to inhaled methacholine [2]. Children wore a finger probe (Ohmeda) to monitor arterial oxygen saturation (SaO2). This was done in order to ensure that significant desaturation did not occur during the challenge. After a 20-min stabilization period, baseline measurements of tc-PO2 were recorded every 30 s for 3 minutes and the mean value calculated. Following inhalation of phosphate-buffered saline, doubling doses of methacholine were administered via a face mask with the nose clipped and the subject seated according to the tidal breathing method [3]. The aerosol was generated using a Wright nebulizer driven by medical air at a rate of 7 to 8L/min to produce an output of 0.14ml/min (SD ± 0.01). Each dose was given for 2-min. A 1-min rest period followed and the value of tc-PO2 was then observed for a further 2-min, during which time the minimal value of tc-PO2 was recorded. The starting concentration of methacholine was either 0.25 or 3 2.0 g/L depending on whether or not they had wheezed in the last year, respectively [4]. Children who reported current cough but no wheeze started at a concentration of 1.0g/L. Doses were then doubled until the maximum dose of 32.0g/L was reached or a response (> 20% fall in tc-pO2 from pre-saline baseline) was obtained. (The PC20-Ptc,O2 a 20% decrease in measured tc-PO2 from pre-saline baseline) the measure of response, was estimated by linear interpolation of logged metacholine concentrations between two the dose steps which bracketed the endpoint [4]. All subjects except two responded by the final concentration. Children responding to methacholine were given 2.5 mg salbutamol via a nebulizer (Microneb, Market Harborough, England) driven by oxygen at a rate of 6L/min. Tc-PO2 was measured 15 min after salbutamol to ensure that it had returned to baseline levels. Prior to analysis, the values of PC20-Ptc,O2 were logarithmically transformed to make them more compatible with a normal distribution [5]. The assessment of atopic status Following bronchial challenge, children were tested against four aeroallergens: cat hair, dog danders, Dermatophagoides pteronyssinus, and mixed grass pollens (Bencard, Brentford, Middlesex, UK). Positive control was histamine 1mg/ml; negative control was the solution in which the allergens were dissolved. The reaction was assessed 5 to 15 min after skin prick testing according to the method of Pepys [6]. Responses was deemed positive if a wheal of diameter 2mm or more was observed and was larger than any response to the negative control. If one or more positive reactions were observed, the child was designated as atopic. Selection of variables Ideally phenotype modelling should involve all phenotypic information available. In our study a pre-selection of variables to be included was necessary for two reasons. First, given the limited sample size there was a need to limit the number of parameters in the model. Second, 4 analysis was based on the assumption of independence of the variables within phenotypes (See section on the statistical model below). Including many closely related variables in the model would have made it increasingly difficult to maintain this assumption. To select the variables for our analysis we proceeded as follows: we first classified all symptom data related to wheeze or cough from the first two surveys and all objective measurements into groups of variables each representing a common trait. We used multiple correspondence analysis (MCA) [7] to select the single variable which best represented each group and to redefine the categories within the selected variable (for instance for grouping the four seasons into just two: summer and winter) (Table E1). All of the selected symptom variables were included as repeated measures (1990 and 1992-4). We additionally included age at first survey and sex. MCA is a method used to graphically display associations within a multivariate data set of categorical variables (analogous to principal components analysis for continuous variables). The categories of each variable are positioned along major axes representing the main dimensions of association between the variables. Categories positioned closely to each other are closely associated and thus contain similar information [7]. Additional details on this analysis can be requested from the authors. Statistical model Phenotypes were modeled using a finite mixture model for mixed mode data [8, 9] in which FEV0.5 and log transformed PC20-Ptc,O2 [5] were treated as continuous variables with a normal distribution and all other variables as categorical. In such a model each phenotype is represented by a distinct probability distribution of the included variables. Following a latent class analysis (LCA) approach we assumed that all variables were independent within a given phenotype (assumption of local independence) [8], i.e. that all associations between variables were entirely explained by the existence of distinct phenotypes with no associations remaining within phenotypes. 5 Fitting the model We repeatedly fitted the model with the number of phenotypes varying from 1 to 7 (model 1 to model 7). Estimates were obtained by maximum likelihood estimation using the expectation maximization (EM) algorithm. We used the Fortran program, Multimix [8], which we adapted to deal with missing values [10] and conditional questions. For example, the question on the frequency of wheezing attacks was only asked to those reporting wheeze ever in a particular survey. The EM algorithm is an iterative procedure specifically designed to compute maximum likelihood estimates from incomplete data [11]. The estimates maximize the likelihood of the observed data assuming that all non-observed data are missing at random. In this application the non-observed data included both the unknown class membership (allocation of individuals to the phenotypes) and the missing values in the symptom data or physiological measurements (assumed to be missing at random). The EM algorithm treats these two types of non-observed data in the same manner. The number of missing values for each of the included variables is reported in the last column of Table E1. The EM algorithm requires starting values for the model parameters. Depending on which starting values are chosen, the algorithm may converge to a local rather than to a global maximum of the likelihood function. In order to find the overall maximum, the EM algorithm is typically repeated for different sets of starting values. [9] For each of the models 1 to 7 we chose the best (with highest value of likelihood function) solution resulting from 5000 randomly sampled sets of starting values as the maximum likelihood solution. Additional caution was needed to avoid spurious solutions: phenotypes with little variation in any of the continuous variables may contribute greatly to the likelihood function, even though they represent chance clusters of subjects rather than real entities. [9] Best solutions were therefore discarded if they contained phenotypes with markedly lower coefficients of variation in any of 6 the continuous variables than the phenotypes from next best solutions (which were used instead). To choose the appropriate number of phenotypes we computed the bootstrapped p-values for the likelihood ratio test statistic and the Bayesian information criterion for each of the models 1-7 [9]. Results The two statistical model selection criteria yielded conflicting results: the Bayesian information criterion (BIC) indicated a model with 2 phenotypes, while the bootstrapped pvalues of the likelihood ratio favored a model with 5 phenotypes (p = 0.11 for the nullhypothesis of 5 phenotypes against 6). We had further indications that models 6 and 7 were over-fitting the data (best solutions were rarely attained and many of them seemed to be spurious). Thus we could not distinguish more than 5 phenotypes using our data. In the article we present the 5 phenotype solution. Model estimates for all models 2-5 are reported in Tables E2-E6. 7 References 1. Nystad W, Samuelsen SO, Nafstad P, Edvardsen E, Stensrud T, Jaakkola JJ. Feasibility of measuring lung function in preschool children. Thorax 2002; 57: 10211027. 2. Wilson NM, Phagoo SB, Silverman M. Use of transcutaneous oxygen tension, arterial oxygen saturation, and respiratory resistance to assess the response to inhaled methacholine in asthmatic children and normal adults. Thorax 1991; 46: 433-437. 3. Cockcroft DW, Killian DN, Mellon JJ, Hargreave FE. Bronchial reactivity to inhaled histamine: a method and clinical survey. Clin Allergy 1977; 7: 235-243. 4. Hargreave FE, Ryan G, Thomson NC, et al. Bronchial responsiveness to histamine or methacholine in asthma: measurement and clinical significance. J Allergy Clin Immunol 1981; 68: 347-355. 5. Chinn S. Methodology of bronchial responsiveness. Thorax 1998; 53: 984-988. 6. Pepys J. Skin tests in diagnosis. In: Gell PH, Coombs RRA, Lach PJ, eds. Clinical Aspects of Immunology. 3 Edn. Oxford, Blackwell Scientific Publications, 1975; pp. 55-80. 7. Greenacre MJ. Theory and applications of correspondence analysis. London, Academic Press, 1984. 8. Hunt L, Jorgensen M. Mixture model clustering using the MULTIMIX program. Aust N Z J Stat 1999; 41: 153-171. 9. McLachlan G, Peel D. Finite Mixture Models. New York, John Wiley & Sons, 2000. 10. Hunt L, Jorgensen M. Mixture model clustering for mixed data with missing information. Comput Stat Data Anal 2003; 41: 429-440. 11. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J Roy Statist Soc Ser B 1977; 39: 1-38. 8 TABLE E1. List of available variables by groups of common traits and variable selected for phenotype identification within each group by multiple correspondence analysis (see text). Available variables Common trait Selected variable (categories) Number of missing values (% of n=319) Symptom variables (surveys 1990 and 1992-94) Wheeze ever Age at onset of wheeze yes in the first, no in the second / Time since last wheeze attack yes in both / no in the first, yes in Symptom history: wheeze Wheeze ever (no in both surveys / 1990 1992-94 0 (0.0) 0 (0.0) 0 (0.0) 6 (1.9) the second) Lifetime number of wheeze attacks Symptom frequency: Number of wheeze attacks past year wheeze Usual duration of wheeze attacks Attacks accompanied by shortness of breath Symptom severity: wheeze Shortness of breath (yes/ no) 0 (0.0) 51 (16.0) Attacks occur a) with colds; b) apart from colds; c) with Triggers: wheeze New variable (Wheeze with colds 8 (2.5) 53 (16.6) 2 (0.6) 60 (18.8) 0 (0.0) 53 (16.6) (none/1 to 2/ more than 2) exercise; d) drinking or eating , e) animals dust, grass* Attacks of wheeze more frequent during particular time of only/ other triggers also) Seasonal variation: wheeze year (If yes, months of more frequent attacks) Attacks of wheeze worse at particular time of day (If yes, day or night) Number of attacks past year New variable (No seasonal variation/ winter/ summer) Diurnal variation: wheeze New variable (No diurnal variation/ night / day) 9 Wakened at night by cough without cold or chest infection Night cough Cough at night (yes/ no) 176 (55.2)† 4 (1.3) Cough occurs: a) usually with colds; b) also apart from Triggers: cough Cough without colds (no 0 (0.0) 6 (1.9) colds* cough/only with colds/also apart from colds) Physiological measurements (survey 1992-4) Skin-prick tests (cat dander, dog dander, Atopy Dermatophagoides pteronyssinus, mixed grass pollens) Mid-expiratory flow (MEF25-75) Forced expiratory volume at 0.5 and 1.0 seconds New variable: Test positive for at 121 (37.9) least one allergen/ no test positive Airway obstruction FEV0.5 standardised for sex and 114 (35.7) height [1] (FEV0.5 , FEV1 ), Forced vital capacity (FVC ) Peak expiratory flow (PEF) Provoking concentration of methacholine for 20% decrease Bronchial responsiveness PC20tc-Po2 log transformed [5] 140 (43.9) in transcutaneous oxygen tension (PC20-Ptc,O2) * A separate response (yes/no) was requested for each item † The large number of missing values is explained by the fact that this question was asked conditional on reported wheeze ever in the first survey while our model treats it as applicable to all. 10 TABLE E2. Estimates for the 2-phenotype model Phenotype* 2A 2B Variable / parameter Sample prevalence 0.65 0.35 Female 0.55 0.41 Male 0.45 0.59 0 to 2 yrs 0.40 0.51 3 to 5 yrs 0.60 0.49 Never 0.62 0.00 S1 yes, S2 no 0.03 0.22 S1 yes, S2 yes 0.27 0.53 S1 no, S2 yes 0.08 0.25 0 0.00 0.27 1 to 2 0.09 0.37 >2 0.21 0.11 S1, attacks with shortness of No 0.03 0.34 breath† Yes 0.27 0.41 S1, triggers of wheeze† Only colds 0.02 0.54 Colds and other 0.27 0.21 S1, season with most frequent Indifferent 0.18 0.57 attacks† Winter 0.09 0.17 Summer 0.03 0.01 Indifferent 0.07 0.36 Night 0.21 0.35 Day 0.02 0.03 No 0.27 0.80 Yes 0.73 0.20 No cough 0.06 0.18 Only with colds 0.22 0.58 Also without colds 0.72 0.24 0 0.09 0.54 1 to 2 0.11 0.16 >2 0.15 0.08 Sex Age in 1990 Wheeze ever S1, frequency of attacks† S1, time of day with worse attacks† S1, wakened by cough at night S1, cough S2, frequency of attacks† 11 S2, attacks with shortness of No 0.22 0.41 breath† Yes 0.13 0.38 S2, triggers of wheeze† Only colds 0.04 0.52 Colds and other 0.30 0.27 S2, season with most frequent Indifferent 0.09 0.37 attacks† Winter 0.15 0.38 Summer 0.10 0.04 Indifferent 0.06 0.18 Night 0.25 0.56 Day 0.04 0.04 No 0.51 0.63 Yes 0.49 0.37 No cough 0.08 0.18 Only with colds 0.41 0.60 Also without colds 0.50 0.22 All negative 0.68 0.82 ≥ 1 positive 0.32 0.18 -1.56 -1.30 SD 1.26 1.05 Natural logarithm of bronchial Mean 0.77 0.81 responsiveness (PC20 log(g/L)) SD 0.82 0.91 S2, time of day with worse attacks† S2, wakened by cough at night S2, cough Skin prick tests FEV0.5 (z-scores) Mean Definition of abbreviations: S1 Survey 1 (1990), S2 Survey 2 (1992-4), SD standard deviation Data are estimated probabilities where not otherwise noted. * Phenotype labels are the same as those for corresponding phenotype clusters in Figure 2 † Omitted category with which the probabilities sum to one is “no” to wheeze ever in the respective survey. 12 TABLE E3. Estimates for the 3-phenotype model Phenotype* 3A 3B 3C Variable / parameter Sample prevalence 0.50 0.31 0.18 Female 0.55 0.45 0.46 Male 0.45 0.55 0.54 0 to 2 yrs 0.45 0.43 0.42 3 to 5 yrs 0.55 0.57 0.58 Never 0.81 0.00 0.00 S1 yes, S2 no 0.03 0.12 0.21 S1 yes, S2 yes 0.05 0.69 0.65 S1 no, S2 yes 0.11 0.19 0.14 0 0.04 0.00 0.40 1 to 2 0.04 0.29 0.40 >2 0.00 0.53 0.06 S1, attacks with shortness of No 0.05 0.14 0.39 breath† Yes 0.03 0.68 0.47 S1, triggers of wheeze† Only colds 0.09 0.16 0.60 Colds and other 0.00 0.65 0.26 S1, season with most frequent Indifferent 0.06 0.48 0.72 attacks† Winter 0.02 0.27 0.14 Summer 0.01 0.06 0.00 Indifferent 0.03 0.21 0.50 Night 0.06 0.55 0.31 Day 0.00 0.05 0.04 No 0.57 0.34 0.94 Yes 0.43 0.66 0.06 No cough 0.06 0.08 0.26 Only with colds 0.11 0.48 0.74 Also without colds 0.83 0.44 0.00 0 0.10 0.25 0.65 1 to 2 0.03 0.27 0.14 Sex Age in 1990 Wheeze ever S1, frequency of attacks† S1, time of day with worse attacks† S1, wakened by cough at night S1, cough S2, frequency of attacks† 13 >2 0.03 0.35 0.00 S2, attacks with shortness of No 0.07 0.54 0.43 breath† Yes 0.09 0.33 0.35 S2, triggers of wheeze† Only colds 0.12 0.14 0.61 Colds and other 0.03 0.74 0.17 S2, season with most frequent Indifferent 0.10 0.23 0.38 attacks† Winter 0.06 0.40 0.41 Summer 0.00 0.25 0.00 Indifferent 0.03 0.15 0.21 Night 0.12 0.64 0.49 Day 0.00 0.10 0.09 No 0.58 0.40 0.76 Yes 0.42 0.60 0.24 No cough 0.08 0.09 0.27 Only with colds 0.41 0.46 0.69 Also without colds 0.51 0.44 0.05 All negative 0.85 0.51 0.86 ≥ 1 positive 0.15 0.49 0.14 -1.44 -1.67 -1.19 SD 1.30 1.14 0.93 Natural logarithm of bronchial Mean 1.01 0.34 1.01 responsiveness (PC20 log(g/L)) SD 0.62 1.04 0.58 S2, time of day with worse attacks† S2, wakened by cough at night S2, cough Skin prick tests FEV0.5 (z-scores) Mean Definition of abbreviations: S1 Survey 1 (1990), S2 Survey 2 (1992-4), SD standard deviation Data are estimated probabilities where not otherwise noted * Phenotype labels are the same as those for corresponding phenotype clusters in Figure 2 † Omitted category with which the probabilities sum to one is “no” to wheeze ever in the respective survey. 14 TABLE E5. Estimates for the 4-phenotype model Phenotype* 4A 4B 4C 4D Variable / parameter Sample prevalence 0.27 0.27 0.27 0.20 Female 0.59 0.52 0.44 0.43 Male 0.41 0.48 0.56 0.57 0 to 2 yrs 0.36 0.53 0.40 0.47 3 to 5 yrs 0.64 0.47 0.60 0.53 Never 0.74 0.78 0.00 0.00 S1 yes, S2 no 0.06 0.02 0.12 0.21 S1 yes, S2 yes 0.08 0.10 0.71 0.62 S1 no, S2 yes 0.11 0.11 0.17 0.17 0 0.08 0.00 0.00 0.37 1 to 2 0.07 0.06 0.28 0.39 >2 0.00 0.06 0.55 0.07 S1, attacks with shortness of No 0.07 0.03 0.14 0.39 breath† Yes 0.08 0.09 0.69 0.44 S1, triggers of wheeze† Only colds 0.13 0.02 0.17 0.58 Colds and other 0.01 0.09 0.65 0.25 S1, season with most frequent Indifferent 0.09 0.06 0.51 0.68 attacks† Winter 0.04 0.05 0.26 0.15 Summer 0.01 0.01 0.06 0.00 Indifferent 0.06 0.04 0.19 0.48 Night 0.08 0.06 0.60 0.30 Day 0.00 0.02 0.04 0.05 No 0.56 0.00 0.40 0.91 Yes 0.44 1.00 0.60 0.09 No cough 0.10 0.00 0.08 0.26 Only with colds 0.22 0.00 0.53 0.71 Also without colds 0.68 1.00 0.39 0.03 0 0.11 0.16 0.21 0.62 1 to 2 0.03 0.05 0.27 0.17 Sex Age in 1990 Wheeze ever S1, frequency of attacks† S1, time of day with worse attacks† S1, wakened by cough at night S1, cough S2, frequency of attacks† 15 >2 0.05 0.00 0.41 0.00 S2, attacks with shortness of No 0.12 0.09 0.54 0.43 breath† Yes 0.07 0.11 0.35 0.36 S2, triggers of wheeze† Only colds 0.14 0.12 0.11 0.60 Colds and other 0.06 0.08 0.78 0.18 S2, season with most frequent Indifferent 0.09 0.16 0.22 0.35 attacks† Winter 0.11 0.05 0.39 0.43 Summer 0.00 0.00 0.28 0.00 Indifferent 0.06 0.02 0.14 0.19 Night 0.13 0.16 0.65 0.52 Day 0.00 0.02 0.09 0.09 No 0.34 0.79 0.37 0.78 Yes 0.66 0.21 0.63 0.22 No cough 0.00 0.18 0.06 0.27 Only with colds 0.00 0.82 0.45 0.73 Also without colds 1.00 0.00 0.49 0.00 All negative 0.84 0.88 0.44 0.85 ≥ 1 positive 0.16 0.12 0.56 0.15 -1.58 -1.18 -1.74 -1.23 SD 1.45 1.01 1.16 0.88 Natural logarithm of bronchial Mean 0.91 1.16 0.22 0.97 responsiveness (PC20 log(g/L)) SD 0.58 0.49 1.13 0.60 S2, time of day with worse attacks† S2, wakened by cough at night S2, cough Skin prick tests FEV0.5 (z-scores) Mean Definition of abbreviations: S1 Survey 1 (1990), S2 Survey 2 (1992-4), SD standard deviation Data are estimated probabilities where not otherwise noted * Phenotype labels are the same as those for corresponding phenotype clusters in Figure 2 † Omitted category with which the probabilities sum to one is “no” to wheeze ever in the respective survey. 16 TABLE E6. Estimates for the 5-phenotype model Phenotype* 5A 5B 5C 5D 5E Variable / parameter Sample prevalence 0.31 0.25 0.18 0.15 0.11 Female 0.56 0.52 0.51 0.51 0.27 Male 0.44 0.48 0.49 0.49 0.73 0 to 2 yrs 0.39 0.54 0.30 0.61 0.36 3 to 5 yrs 0.61 0.46 0.70 0.39 0.64 Never 0.66 0.82 0.00 0.00 0.00 S1 yes, S2 no 0.08 0.02 0.09 0.10 0.27 S1 yes, S2 yes 0.16 0.07 0.66 0.69 0.65 S1 no, S2 yes 0.10 0.08 0.25 0.21 0.09 0 0.07 0.00 0.00 0.00 0.63 1 to 2 0.11 0.06 0.16 0.51 0.28 >2 0.06 0.04 0.59 0.28 0.00 S1, attacks with shortness of No 0.11 0.06 0.02 0.23 0.48 breath† Yes 0.13 0.04 0.73 0.56 0.43 S1, triggers of wheeze† Only colds 0.18 0.06 0.00 0.40 0.64 Colds and other 0.06 0.04 0.75 0.39 0.28 S1, season with most frequent Indifferent 0.18 0.09 0.43 0.41 0.83 attacks† Winter 0.04 0.00 0.22 0.39 0.08 Summer 0.02 0.00 0.09 0.00 0.00 Indifferent 0.08 0.00 0.23 0.23 0.64 Night 0.16 0.08 0.47 0.46 0.28 Day 0.00 0.01 0.04 0.10 0.00 No 0.68 0.21 0.18 0.69 0.94 Yes 0.32 0.79 0.82 0.31 0.06 No cough 0.09 0.00 0.11 0.12 0.32 Only with colds 0.29 0.00 0.40 0.72 0.65 Also without colds 0.63 1.00 0.49 0.16 0.03 0 0.12 0.10 0.23 0.44 0.69 1 to 2 0.05 0.02 0.31 0.31 0.05 Sex Age in 1990 Wheeze ever S1, frequency of attacks† S1, time of day with worse attacks† S1, wakened by cough at night S1, cough S2, frequency of attacks† 17 >2 0.09 0.02 0.38 0.15 0.00 S2, attacks with shortness of No 0.14 0.02 0.61 0.45 0.64 breath† Yes 0.12 0.13 0.30 0.45 0.09 S2, triggers of wheeze† Only colds 0.14 0.15 0.00 0.43 0.60 Colds and other 0.12 0.00 0.91 0.46 0.13 S2, season with most frequent Indifferent 0.08 0.15 0.29 0.08 0.73 attacks† Winter 0.13 0.00 0.32 0.75 0.00 Summer 0.04 0.00 0.31 0.06 0.00 Indifferent 0.07 0.04 0.18 0.00 0.46 Night 0.19 0.09 0.61 0.80 0.27 Day 0.00 0.02 0.12 0.09 0.00 No 0.31 0.81 0.37 0.54 0.95 Yes 0.69 0.19 0.63 0.46 0.05 No cough 0.00 0.15 0.09 0.17 0.33 Only with colds 0.00 0.85 0.38 0.83 0.67 Also without colds 1.00 0.00 0.53 0.00 0.00 All negative 0.81 0.84 0.30 0.91 0.78 ≥ 1 positive 0.19 0.16 0.70 0.09 0.22 -1.59 -1.18 -1.80 -1.47 -1.09 SD 1.41 1.05 1.41 0.57 0.96 Natural logarithm of bronchial Mean 0.88 1.01 0.23 0.84 0.91 responsiveness (PC20 log(g/L)) SD 0.60 0.90 1.18 0.58 0.65 S2, time of day with worse attacks† S2, wakened by cough at night S2, cough Skin prick tests FEV0.5 (z-scores) Mean Definition of abbreviations: S1 Survey 1 (1990), S2 Survey 2 (1992-4), SD standard deviation Data are estimated probabilities where not otherwise noted * Phenotype labels are the same as those for corresponding phenotype clusters in Figure 2 † Omitted category with which the probabilities sum to one is “no” to wheeze ever in the respective survey.