Reviewer's report Title: Measuring socioeconomic status in multi-country studies: Results from the eightcountry MAL-ED study Version: 1 Date: 20 August 2013 Reviewer: Sean Green Reviewer's report: Major compulsory issues 1. The analysis is thorough and the paper clearly articulates the need for a better measure of SES for cross-country comparisons, but the authors need to present a more compelling case that the wealth measure they have created meets this need. On lines 263-265 (pg 11) the authors mention that HAZ is used instead of WHZ because HAZ is a better composite measure of acute and chronic deprivation, but there is no explanation for why assessing the association between the composite SES measure and HAZ is a sufficient test of the suitability of the SES measure. If association with HAZ is an adequate test for an SES measure, and parsimony is a consideration, why not just use HAZ itself? It would be good if the analysis could show how the composite SES index fares when associated with other quantities of interest (e.g. biostatistical measures such as WHZ, WAZ, and BMIZ, or other survey measures that are routinely associated with SES in single-country analyses). Thank you for this comment. We will respond in two parts, first to the question of why we used the association between SES and HAZ to assess suitability of our SES measure, and second, whether our new measure is an improvement over what previously existed. The question of why we assessed associations between SES and HAZ is an important one, which we discussed at length while working on this paper. First, the association with HAZ is not the only way we assessed the suitability of this SES measure, or even the primary way. The development of the four possible measures of wealth, and then the full measure of SES, was based on an understanding of underlying theory (e.g. SES represents wealth/income, education, occupation), as well as grounded in the extensive literature exploring approaches to measuring wealth and SES, such as Filmer & Pritchett’s 2001 paper in Demography. In our view, this strong grounding is sufficient justification for the approaches to measuring SES that we compare, with the exception of mother’s education, which is included because it has been used (arguably incorrectly) in previous research as a simple proxy. We chose to take these analyses of suitability a step further for two reasons: 1) we were interested in comparing the four measures of wealth directly, and 2) analyzing associations of a construct of interest with other constructs that are believed to be related theoretically or empirically is an accepted approach to assessing construct validity (Cronbach & Meehl 1955). We have added this explanation more explicitly starting in line 263. Further, we could not use HAZ as a proxy for SES because it was one of our primary outcomes of interest, and because that approach could not be justified on theoretical grounds. Although we expected HAZ and SES to be statistically associated, we do not assume that they are equivalent. As described (in what is now lines 267-269), we chose HAZ because it is understood to be a child growth measure associated with chronic deprivation, rather than WHZ, which is more often associated with acute deprivation. This was more closely aligned with our theoretical understanding of SES. However, we did also assess associations with WHZ (not associated, as expected, results not shown). Further, in analyses that were published in PHM in December 2012 (see citation below), we found that, while food access insecurity and low SES are related in the study population, food access insecurity also appears to capture a separate area of vulnerability that is (at least in part) independent of SES. In the case of that study, we found similar results when using a PCAbased measure of SES and a simpler sum score of household assets. Our final measure of SES in the current analyses was also significantly associated with food access insecurity, as expected given the hypothesized nomological network (Cronbach & Meehl 1955). Stephanie Psaki; Zulfiqar Bhutta; Tahmeed Ahmed; A.M. Shamsir Ahmed; Pascal Bessong; Munirul Islam; Sushil John; Margaret Kosek; Cebisa Nesamvuni; Prakash Shrestha; Erling Svensen; Stephanie Richard; Jessica Seidman; Laura Caulfield; Aldo A Lima; Mark Miller; William Checkley; MALED Network Investigators. Household food insecurity and child malnutrition: Results from the eight-country MAL-ED study. Population Health Metrics. 2012 Dec 13; 10(1): 24. In response to whether our measure is an improvement over what previously existed, we believe that it is for several reasons: 1) It is a robust measure of SES that reflects theory on what SES is intended to measure, 2) It reduces the data collection burden by highlighting a priority set of indicators for measurement, in contrast to commonly-used PCA approaches, which require collecting data on a full set of indicators, even if some are irrelevant, 3) It is computationally simple to apply, once the priority assets have been selected using the random forests technique. We have added text to more explicitly outline these contributions starting in line 453. [Cronbach, LJ & Paul E. Meehl. Construct Validity in Psychological Tests. Psychological Bulletin. (1955). 52: 281-302.] 2. The statement on lines 231 and 232 of page 10 "household eligibility was determined based on location..." is a bit ambiguous. Were households that comprised the intersection of the set of households within the MAL-ED catchment area and the set of households with children aged 24 to 60 months chosen, or were there certain locations in the catchment area that were favorably selected (e.g. peri-urban or rural locations). The selection process is important because whether the 798 households selected had statistically significant differences in the proportions of households from peri-urban, urban, and rural locations or whether this data set differs from Filmer and Pritchett's approach to DHS data in regards to bias towards sampling in urban areas. An clearer explanation of the selection process or a chart showing summary statistics for proportion of households of each type would address this issue. Thank you for this question, we can see that it was unclear as written in the text on lines 231232. All households that met those two criteria (within the MAL-ED catchment area and had a child aged 24 to 60 months in the household) were eligible. We have edited the wording to clarify this. Table 1 describes each study site. We have added an additional column to more explicitly show which sites are urban, rural, or peri-urban, and added some clarifying language in line 220 on the study setting. Three of the sites are urban (in Brazil, Bangladesh, and India), two sites are peri-urban (in Peru and Nepal), and three sites are rural (in South Africa, Pakistan, and Tanzania). Given the mix, we do not feel that these analyses suffer from the bias toward urban areas that Filmer and Pritchett discuss. 3. What was the percent of missingness in the data-- especially in the attributes that were selected by the random forest and the PCA index. Some matrix factorization techniques are impeded by data with a high degree of missingness so the software package will perform list-wise deletion to produce a matrix that is positive semidefinite. Depending on the random forest algorithm used missingness can be handled in several ways-- none of which involves list-wise deletion. Depending on the degree of missingness in the data random forest's variable importance measures may be aided by the fact that they include observations that are excluded in the other methods because of the methods' implicit treatment of missingness. Without knowing the completeness of the data and the random forest algorithm used, it will be difficult to reproduce your results. If your data has no missingness, then you may want to consider using conditional variable importance ( Carolin Strobl's method in R's cforest package) because it can provide more robust variable importance results in data sets without missing As described in lines 325-327, we dropped 11 observations because of missing or extreme anthropometric values. All remaining observations had complete data on the variables used for these analyses. We have added a sentence at line 327 with that clarification. We did calculate and use conditional variable importance using the cforest package in R, and have added that clarification in lines 287-288. Minor essential revisions 4. There are three measures of variable importance associated with random forests-- Gini importance, permutation importance, and conditional importance (Strobl's method). The paper should be clear about which one was used, because the chosen method has implications for how the importance measures perform. We thank the reviewer for this question and request for clarification of the method. Indeed, we used Strobl’s approach. As the reviewer correctly points out, we calculated conditional variable importance, and have added this clarification in lines 287-288. Discretionary revisions 5. Leave one out cross-validation (LOOCV) provides optimistic, deflated estimates of MSE if any of your data rows are duplicates since, with duplicate rows, the training set contains an exact match for the test row at least twice. You may want to do a uniqueness test on your data because just one duplicate entry would likely break the tie between random forest, PCA, and the composite SES index in Table 4. We also conducted five- and ten-fold cross validation, but chose to only include the results from LOOCV because it showed similar, consistent results with five-fold and ten-fold cross validation. We have added a note under Table 4 with this information. 6. On line 393-394 of page 17 the author states that maternal education was more parsimonious than other methods but was rejected. It would be good if the way that parsimony was treated in evaluation of models could be explicitly stated since the parsimony of models is included in Table 4. Is a model with eight terms eight times as bad as one with one term? A simple method of address this would be to use the Akaike Information Criteria (AIC) or Bayesian Information Criteria (BIC) which explicitly include terms for model degrees of freedom. Thank you very much for this comment, and we have amended this section of the manuscript. Your comment made us think more in depth about this point, and upon revisiting the subject we realized we were discussing parsimony from the standpoint of the number of variables that were included in the development of each metric instead of statistical parsimony (i.e., fewers variables in a regression model). Specifically, all four SES metrics (plus the WAMI index) are summarized into single independent variable. So there there may be no value in including AIC/BIC as a comparative metric of model fit (essentially all metrics have the same number of variables in the model). Moreover, given that the models are not nested with each other, it made less sense to use likelihood based approaches for model comparison. This is the main reason why we chose to use cross-validation approaches (LOOCV) and effect size as methods to compare the models. Again, thank you very much for bringing up this point and allowing us to provide further clarification. Reviewer's report Title: Measuring socioeconomic status in multi-country studies: Results from the eight-country MAL-ED study Version: 1 Date: 22 August 2013 Reviewer: Abraham Flaxman Reviewer's report: This paper takes on an important topic, how to measure socioeconomic (SE) status cross-culturally. The authors have collected a new dataset consisting of information on 800 children aged 24-60 months in 8 distinct sites. Although not explicitly stated, it seems that the authors are also particularly interested in developing an SE index that is easily implemented in a household survey and easily calculated from the survey results, and their proposed solution, the WAMI index, achieves this goal. The paper uses a sophisticated machine learning technique to select household assets for use in a wealth index, and uses the components of the SE index independently and collectively to predict levels of height-for-age Z score (HAZ). However, the paper has some major shortcomings that prevent me from recommending it for publication at this time. It does not satisfactorily define SE status, and it does not describe in sufficient detail how the novel random forest approach works. Furthermore, validating the SE index by its ability to predict low HAZ seems circular. Major Compulsory Revisions 1. Please include a detailed definition of socioeconomic status and contextualization it among existing, competing definitions and operationalizations. It seems that there is no settled conceptual definition of SE status [1], and even calling it “status” is not universal, with some authors considering “position” more precise [2,3]. This makes it essential for your work to clearly state the definition you are using. Thank you for this comment. You are right that there has been much discussion in the literature about the definition of socio-economic status vs. socio-economic position, as well as the appropriate measurement of both. We have defined socio-economic status in lines 166-168 in accordance with Adler and colleagues’ definition (1994), which is also the definition used by the U.S. Centers for Disease Control: “a composite measure that typically incorporates economic, social, and work status. Economic status is measured by income. Social status is measured by education, and work status is measured by occupation.” Krieger and colleagues’ main argument in favor of using the term socio-economic position, rather than socio-economic status, is that the latter, “blurs distinctions between two different aspects of socioeconomic position: (a) actual resources, and (b) status, meaning prestige- or rank-related characteristics.” They explain that indicators in the former category tend to be dichotomous (e.g. owning a chair), whereas prestige or rank-related characteristics are typically continuous, e.g. level of access to services. Krieger and colleagues’ definition of socio- economic position explicitly includes indicators in the latter category, which are not included in our measure. We have added this clarification as a footnote in the text in line 169, as well as adding the lack of prestige or rank-related characteristics more explicitly as a limitation in lines 465-466. 2. Complete description of the random forest method used. A satisfactory revision will at least address the following questions, of which I was unsure of the answers in the current version: Why is it necessary to use the subset of indicators selected for PCA (line 282) and what happens if you do not? Did you use “supervised learning” specifically to predict HAZ (line 283-4) or “unsupervised learning” (which is more commonly used to cluster instances, but can also be used to identify important covariates)? If you used supervised learning with RF to predict HAZ and compared this to methods that were not designed specifically to predict HAZ, it is not surprising that RF performed better (see point 4 below). Why did you choose 8 indicators (line 284-5), instead of more or less? Thank you for this comment. We have provided additional information on the random forests method used, as requested. We made a decision to use the same initial set of indicators that was used in PCA so that the results would be comparable. If we used a smaller subset of indicators to start, we would not know whether a difference in predictive power was due to the choice of indicators or the method used. We would not have used a larger set of indicators because the initial set used for PCA were selected based on assessments of variation and internal consistency. We describe our reason for choosing 8 indicators in the results section (currently lines 344-347). Since we found no accepted approach to this in the literature, we borrowed from the methods used in PCA and created a scree plot using variable importance. We selected eight variables based on the results of the scree plot (not shown), because the magnitude of change in variable importance between items dropped off notably after eight variables. We used unsupervised learning with conditional random forests to predict HAZ, and have added this to the text. 3. Explicitly state goal of developing a simple index (implicit in the parsimony criteria on line 301-303, perhaps also in “easily collected across diverse settings” observation on line 403-404, and table 5), and give necessary criteria for any solution to be considered simple (or easy or parsimonious or whatever you decide to call it). We have added the goal of developing a simple index explicitly in several places in the text, including in lines 270-272, and 415-416. We also added a note under Table 4 explaining more clearly how parsimony was assessed and compared. 4. Validation of wealth or SE index by predicting HAZ seems invalid. For example, selecting RF over MPI as the preferred measure of wealth based on the MSE of 1.37 vs 1.39 for predicting HAZ says that RF is a better proxy for HAZ, not necessarily for wealth (that these MSE values are similarly large is also concerning, but that is another issue). Implicit in the validation approach is the assumption that a better predictor for wealth is a better predictor for HAZ, which I find implausible. At the very least, this should be clearly stated as a limitation of the study. The question of why we assessed associations between SES and HAZ is an important one, which we discussed at length while working on this paper. First, the association with HAZ is not the only way we assessed the suitability of this SES measure, or even the primary way. The development of the four possible measures of wealth, and then the full measure of SES, was based on an understanding of underlying theory (e.g. SES represents wealth/income, education, occupation), as well as grounded in the extensive literature exploring approaches to measuring wealth and SES, such as Filmer & Pritchett’s 2001 paper in Demography. In our view, this strong grounding is sufficient justification for the approaches to measuring SES that we compare, with the exception of mother’s education, which is included because it has been used (arguably incorrectly) in previous research as a simple proxy. We chose to take these analyses of suitability a step further for two reasons: 1) we were interested in comparing the four measures of wealth directly, and 2) analyzing associations of a construct of interest with other constructs that are believed to be related theoretically or empirically is an accepted approach to assessing construct validity (Cronbach & Meehl 1955). Although we expected HAZ and SES to be statistically associated, we do not assume that they are equivalent. We have added this explanation more explicitly starting in line 263, as well as including this as a limitation in lines 450-453. As for the MSE values for RF and PCA, we agree that they are very close, and did not use the difference between these values as one of our criteria for selecting random forests. Rather, as described in the “choice of wealth measure” section starting in line 375, we considered those values to be equivalent, and chose RF because of the parsimony and simplicity of application, once the important variables are selected. Minor Essential Revisions 5. State all household asset questions from which the 16 used in PCA were chosen, as well as the reason for excluding any questions outside these 16. We have added additional detail in the “household wealth measurement” section (starting in line 346) on how the 16 assets were chosen, including a full list of the assets. 6. I am not familiar with using maternal education as a proxy for household wealth (line 268-9), please include references. Below are two references that discuss the use of mother’s education as a proxy for SES. We have added these references to the text. The Fotso et al. (2005) paper you mentioned previously also references this practice. Monteiro, CA; Conde, WL; Popkin, BM. Obesity and inequities in health in the developing world. International Journal of Obesity (2004) 28, 1181-1186. Desai, S. & Alva, S. Maternal education and child health: Is there a strong causal relationship? Demography (1998) 35, 71-81. Discretionary Revisions 7. Food insecurity questions (line 228) seem highly relevant, and perhaps should not be excluded (line 278-9). We previously assessed associations between food access insecurity and anthropometry in the study population. In analyses that were published in PHM in December 2012 (citation below), we found that, while food access insecurity and low SES are related in the study population, food access insecurity also appears to capture a separate area of vulnerability that is (at least in part) independent of SES. We found similar results when using a PCA-based measure of SES and a simpler sum score of household assets. Our final measure of SES in the current analyses was also significantly associated with food access insecurity, as expected given the hypothesized nomological network (Cronbach & Meehl 1955). Stephanie Psaki; Zulfiqar Bhutta; Tahmeed Ahmed; A.M. Shamsir Ahmed; Pascal Bessong; Munirul Islam; Sushil John; Margaret Kosek; Cebisa Nesamvuni; Prakash Shrestha; Erling Svensen; Stephanie Richard; Jessica Seidman; Laura Caulfield; Aldo A Lima; Mark Miller; William Checkley; MALED Network Investigators. Household food insecurity and child malnutrition: Results from the eight-country MAL-ED study. Population Health Metrics. 2012 Dec 13; 10(1): 24. [Cronbach, LJ & Paul E. Meehl. Construct Validity in Psychological Tests. Psychological Bulletin. (1955). 52: 281-302.] 8. Were translated questionnaires back-translated for quality assurance? If so, say so (line 248). Yes, questionnaires were back-translated for quality assurance. We have added that clarification in line 249. 9. Supplementary table 1, sort total high to low, instead of alphabetically Thank you for this suggestion. We have made that change, and now country sites are sorted from highest SES score (Brazil) to lowest (Tanzania) from left to right. We have included a note under the table with that clarification. 10. Figure 1, sort high to low, instead of alphabetically We have made this change, and sorted country sites accordingly in Figure 1.