An Excel Sheet for Inferring Number-Knower-Levels from Give-N Data By James Negen, Barbara W. Sarnecka, and Michael Lee Number-knower-levels indicate what children know about counting and the cardinal meaning of number words, which is an important developmental variable. The Give-N methodology, which is used to diagnose knower-level, has been highly refined – in contrast, the field’s analysis of Give-N data remains somewhat crude. Here we work with a model by Lee and Sarnecka (2010), which is a generative model of how children perform in Give-N, allowing more-principled inference of knower-level. We present a close approximation of the model’s inference that can be computed by Microsoft Excel, as well as a worked implementation and instructions for its usage. This should give developmental researchers access to sharper inference about young children’s number-word knowledge. Preschoolers learn very early to recite the first few natural numbers in order – often as young as 2 years old (Fuson, 1988) – but don’t immediately know what those words actually mean. From there, they start to fill in these words with meaning, one at a time and in order (e.g., Carey, 2009; Sarnecka & Lee, 2009). The child’s progress in this process is often referred to as her number-knower-level or just knower-level. A child who does not yet know any number words is called a pre-number-knower or 0-knower. A child who knows the word “one” alone is a 1-knower; knowing “one” and “two” makes her a 2-knower, etc. After being a 3-knower or 4-knower, children become CP-knowers, which means that they know how to use counting to find the cardinality of virtually any set. The child’s progress in this developmental timeline has been a useful variable in several lines of research. For example, Ansari and colleagues (2003) used it to examine deficits caused by Williams Syndrome; Le Corre and Carey (2007) used it to investigate when and how the analogue magnitude system becomes linked to the system of number words; Sarnecka, Kamenskaya, Yamana, Ogura and Yudovina (2007) used it to examine how cross-linguistic variation can influence number-word learning; Duncan and colleagues (2007) found it to be one of the most important predictor variables of success in kindergarten. The most prevalent task used to infer knower-level is the Give-N task. In its usual form, children are given a large bowl of small items and told that they are going to play a game with a puppet. The experimenter asks the child to put a certain number of items to the puppet, e.g., “Can you give Mr. Bunny TWO bananas?”. The requested number of items usually almost always includes the numbers one to four, plus a few more in the range of five to ten. It’s normal to see three trials for each number word. Wynn (1990; 1992) invented the Give-N task and tried to measure whether children at certain ages knew certain number words as a group. From there, people started trying to 1 infer the knower-level of individual children instead of age ranges, either by ad hoc heuristics (e.g., Sarnecka & Gelman, 2004) or trying to find convergence in a titrating method (e.g., Barner, Chow & Yang, 2009). However, these previous methods of inference generally had no principled reason to set the cutoffs where they did. This led to the development of a formal model by Lee and Sarnecka (2010). The model itself can be used to guide inference. The problem is that it requires technical skills that are not prevalent among developmental researchers. Because of this, we decided to develop a reasonable approximation that only requires the user to interact with Microsoft Excel, which is a much more comfortable platform for the target audience. The Present Study In the remaining portions of the paper, we first describe the model itself in detail (Model Description), and then describe how we are approximating it for the excel sheet (Approximation). Next, we describe the Give-N datasets used to calibrate the model’s parameters and test the quality of the approximation (Methods). We then provide posterior distributions over the model parameters from the calibration dataset (Results), compare the model’s inference to the excel sheet’s inference, as well as a popular ad hoc method’s inference (Discussion), and finally provide instructions for the sheet itself (How to Use the Excel Sheet). Model Description The task of creating a more principled method of inferring knower-level from Give-N data was taken on by Lee and Sarnecka (2010), who created a full generative model of the Give-N task. In their model, children have a base-rate of how many items they like to give. This corresponds roughly to what one would expect if the child could be asked to just give however many they wanted from the available bowl of items. This base-rate is then modified when the child is asked for a specific number of items. To be specific, three things happen when the child is asked for X items: 1. If the child knows the word X, it becomes more likely that she will correctly give X items by a factor v. 2. If the child knows some set of words Θ, all words in Θ that are not X become less likely by the same factor v (e.g., a 2-knower is unlikely to give 2 when asked for “three”, even though she is not especially likely to give 3). 3. The probability of all of the words is divided by the sum probability, to make the probabilities sum to unity again. This provides a principled way to predict patterns of data coming from the Give-N task, sorted by knower-level. The way it works out, a pre-number-knower just gives the baserate number of items. A one-knower almost always gives 1 when asked for “one”, but rarely gives 1 when asked for anything else. She does, however, sometimes get “two” right, because it has a high base-rate probability. A 2-knower does much better at “two”, and also rarely gives 2 for anything other than “two”, etc. A CP-knower has a very good chance of getting anything that you throw at her correct, as she is able to use counting the 2 way that adults do (to find the exact cardinality of any set by a point-and-count procedure through the count list). Following (Lee & Sarnecka, 2009), we used a graphical model as our implementation (Figure 1). Discrete variables are indicated by square nodes, and continuous variables are indicated by circular nodes. Stochastic variables are indicated by single-bordered nodes, and deterministic variables (included for conceptual clarity) are indicated by doublebordered nodes. Finally, encompassing plates are used to denote independent replications of the graph structure within the model. In our implementation of the knower-level model in Fig. 1, the data are the observed qij and gij variables, which give the number asked for (the “question”) and the answer (the number “given”), respectively, for the ith child on his or her jth question. The base-rate probabilities are represented by the vector π, which is updated to π’, from which the number given is sampled. (Thus, the functions defining π’ act as a likelihood function.) The update occurs using the number asked for, the knower-level zi of the child, and an evidence value v that measures the strength of the updating. The base rate and evidence parameters, which are assumed to be the same for all children, are given vague priors (i.e., ones that allow for a very large range of possible inferences). The updating rule that defines π’ decomposes into three basic cases. If a number k is greater than the knower-level zi, then regardless of whatever number q they are being asked for, the updated probability remains proportional to the base-rate probability πk for that number. If a number k is within the child’s knower-level range zi, it either increases in probability by a factor of v if it is the number q being asked for, or decreases in probability by a factor v if it is not. For a child who is a CP-knower, his or her range encompasses all of the numbers. The final part of the graphical model relates to the behavior step, with the number of toys given being a draw from the probability distribution π’ representing the updated beliefs. This model, while it fits the known data well and provides a more principled way to measure knower-level from Give-N data, is not very accessible to many researchers who may find it useful; many researchers are not familiar with the WinBUGs language, and/or the (proprietary) program MATLab that is often used to interface with it, and/or the general principles of Bayesian statistical inference, and/or Bayesian cognitive modeling. As such, it has not seen great use in the field, so we developed an approximation that can be computed with Excel. Approximation in the Excel Sheet What do you need in order to get a posterior distribution of knower-levels for a child? The first thing is a prior distribution, which can simply be defined by a researcher and entered into the excel sheet. The second thing is an expression of the likelihood of each response, given each knower-level and also given the question. A standard Markov Chain Monte Carlo (MCMC) method treats this matrix of likelihoods as having a distribution. The exact posterior distribution over knower-levels is the probability of the data given a string of all the parameters (including knower-level), times the relevant priors, integrated over the parameter space’s support, times a normalization constant. Unfortunately, Excel 3 isn’t up to the task of implementing this kind of inference for this model – the integration step is best approached with MCMC methods. The calculations in the excel sheet approximate the posterior knower-level distribution by treating the likelihood as a single flat set of probabilities instead of being subject to distributions. These probabilities can just be calculated once and then stored for future use. They come from sampling out of the distribution of likelihoods, then sampling responses out of those likelihood samples. The likelihoods come, in turn, out of the posterior likelihoods from a calibration dataset (see Methods and Results for a description of that process). The approximate posterior likelihood of a child being a given knowerlevel is the probability of the data under the new flattened likelihood matrix, times the relevant priors, times a normalization constant. If there is enough data in the calibration dataset to already make the posterior distribution very tight, then little should be lost in the approximation. This approximation only requires the program implementing it to do multiplication, addition, and division – well within the powers of Excel. Methods The data used to tune the model was taken from Negen & Sarnecka (2010). For this dataset, children were asked to give one, two, three, four, six, and eight items. Each request was repeated three times in pseudorandom order (total 18 trials). We excluded sessions where the child did not complete at least 15 of the 18 trials, leaving us with 423 sessions. The independent data used to compare the model inference and excel sheet inference came from Lee and Sarnecka (submitted). This dataset has 56 children who were asked for one, two, three, four, five, eight and ten items, three times each (21 trials total). Results The calibration data were run through the model using WinBUGS. Two chains were run, each with 2,000 burn-in samples and 25,000 data collection samples, for a total of 50,000 points of MCMC data. Chain convergence was good, with the R statistic being very close to 1 for all of the variables sampled. The model inferred 9 0-knowers, 48 1-knowers, 50 2-knowers, 53 3-knowers, 67 4knowers and 196 CP-knowers. This distribution has likely arisen because the children are drawn from an area with high socio-economic status. The slight paucity of 0-knowers may be somewhat worrying, but the way the model works, the 0-knowers just give out of the base-rate – which can be inferred from the other data anyway. The inferred posterior prediction is shown in Figure 1, broken down by knower-level. The full numeric breakdown is Sheet 2 of the excel sheet itself. The model inferred an evidence value of 16.94 (SD = .69), which means the base rate is modified by a factor of about 17 by the child’s knowledge of a given number word (the factor v in Model Description). The inferred base-rate is the same as the posterior predictions for 0knowers, also in Figure 2. 4 Discussion What advantages does this method of inference have? To see how the inferences made by the excel sheet compare with a popular ad hoc heuristic and normal Bayesian inference by MCMC, we looked at an independent dataset of 56 children from Lee and Sarnecka (submitted). The ad hoc heuristic requires a child’s correct answers to outnumber their errors by at least 2:1 in order to get credit for knowing that number word. The child’s inferred knower-level is then the highest one that she knows. The data were run though (a) this heuristic, (b) the excel sheet, and (c) a normal MCMC method, after being appended to the calibration dataset. Figures 3 and 4 show all of the posterior distributions for every child from each inference method. The most striking advantage of the excel sheet over the ad hoc method is that the posterior distribution is sometimes very diffuse when little evidence has been accumulated (see #32, #33, #43, and #53 for examples). Thus, the excel sheet comes with an intuitive measure of how much is actually known about the child. There are 38 cases where the excel sheet agrees (in terms of maximum posterior likelihood) with the ad hoc heuristic. Of the remaining 18 cases, only 1 has the excel sheet’s inference concentrated at a lower knower-level. This is primarily because the ad hoc heuristic requires the same thing from every knower-level, whereas the model is more ‘lenient’ on sets that are larger. It’s intuitively appealing that larger sets should be more difficult to generate, so 3-knowers and up should not be required to attain the same level of accuracy as 0-knowers, 1-knowers, or 2-knowers. The excel sheet also takes into account how many trials have passed without the child erroneously giving a certain number. So, for instance, every time a child makes an error when asked for four, and that error is not giving 2, the posterior odds of being a 2-knower receive a small upward push relative to the odds of being a 0-knower or 1-knower. This is because the chances that a 0-knower or 1-knower will erroneously give 2 items is much higher than the chances a 2-knower will commit the same error. (Both the ad hoc heuristic and the excel sheet will do the complement, pushing down the chances of being a 2-knower whenever the child mistakenly gives 2 items in response to a different number-word.) This allows indirect counter-evidence to accumulate against a child’s errors when asked for a number word. The excel sheet’s inference is very close to the inference generated by normal Bayesian inference using MCMC. In terms of maximums, there are no discrepancies. The mean absolute difference between the model’s posterior over knower-levels and the excel sheet’s approximate posterior is 0.2%, with a standard deviation of 0.68%. The largest difference is 4.8% (#50), where both inferences have the same basic shape but the excel sheet is slightly more peaked at the mode. The large a dataset gets, the further it will be able to pull the excel sheet away from the MCMC method. How to Use the Excel Sheet Figure 5 is a screenshot of what the excel sheet itself looks like. The user enters data in the rows near the top labeled “question” and “response”. Both of these must be in the range of 1 to 15. This has 2 implications: (1) If the child was asked for more than 15 items, some other method of analysis must be used. However, to our knowledge, it is rare 5 to find researchers asking for more than 10 items. (2) If the child had more/less than 15 items to give, there may be some minor problems in how the estimation works. However, it will probably be alright if the user translates any maximal response to 15. So, if the child has 10 items available, any time they give 10, enter 15. (The model is based in part on the idea that giving all of the items has a high base-rate probability.) The excel sheet is designed to handle all of the data from a single child at a time. In the question row, the user can enter the numbers requested from left to right, with the child’s responses beneath. (Trials don’t have to stay in order, as long as each response is entered below the corresponding question.) It will fill in, on its own, the likelihood of each question/response pair conditional on each knower-level, in rows 6 to 11. It’s important that there are no questions without responses. Prior likelihood is a place where you can enter prior weights for different knower-levels. For most applications, these should stay at a uniform value. Note that they don’t have to sum to 1 for things to work out correctly. These values might be adjusted, for example, if there is a covariate collected with a known relationship to knower-level. The end result is a set of relative likelihoods for the six knower-levels, along with a graph to visualize them. This should allow the user to see what knower-level is preferred and how strongly. Usually, just taking the one with maximum likelihood will be sufficient; if there is scarce Give-N data available, the user may consider using the likelihood distribution as a set of weights for further analysis. Provided also is the log-likelihood and scaled log-likelihood of the data, as they may be more familiar to some researchers. In the example in Figure 5, the child was asked for one, two, three, four and five. She gave, respectively, 1, 2, 3, 3, and 6. These are entered in rows 3 and 4. The prior likelihood for each knower-level is the same (cells L15 to L20). This leads to a confidence of about 60% that this child is a 2-knower (seen in the graph and under Normalized Likelihood in cell U14). The excel sheet can be downloaded at … 6 Figure 1. A graphical representation of the model. 7 Figure 1. The likelihood of responses, organized by items requested and knower-level. Darker squares indicate higher probability. Blue dots are actual data. 8 Inferred Base Rate Inferred Evidence Strength 0.35 0.3 0.14 Base Rate 95% CI 0.12 0.1 Proability Probability 0.25 0.2 0.15 0.08 0.06 0.1 0.04 0.05 0.02 0 1 2 3 4 5 6 7 8 9 101112131415 Items Given 0 14 16 18 Evidence Strength 20 Figure 2. Inferred parameters of the model: the base-rate (what the child might give if no number word is used) and the evidence strength (the v in the model description; a parameter controlling how much the probability of different responses gets modified by the child’s knowledge of number words). 9 1 2 3 4 pre 1 2 3 4 CP 5 pre 1 2 3 4 CP 6 pre 1 2 3 4 CP 7 pre 1 2 3 4 CP 8 pre 1 2 3 4 CP 9 pre 1 2 3 4 CP 10 pre 1 2 3 4 CP 11 pre 1 2 3 4 CP 12 pre 1 2 3 4 CP 13 pre 1 2 3 4 CP 14 pre 1 2 3 4 CP 15 pre 1 2 3 4 CP 16 pre 1 2 3 4 CP 17 pre 1 2 3 4 CP 18 pre 1 2 3 4 CP 19 pre 1 2 3 4 CP 20 pre 1 2 3 4 CP 21 pre 1 2 3 4 CP 22 pre 1 2 3 4 CP 23 pre 1 2 3 4 CP 24 pre 1 2 3 4 CP 25 pre 1 2 3 4 CP 26 pre 1 2 3 4 CP 27 pre 1 2 3 4 CP 28 pre 1 2 3 4 CP pre 1 2 3 4 CP pre 1 2 3 4 CP pre 1 2 3 4 CP Figure 3. Inferred knower-levels of the first 28 children from Lee and Sarnecka (submitted). The blue bars come from normal Bayesian inference using MCMC. The green bars come from the excel sheet. The red bars come from the ad hoc heuristic. 10 29 30 31 32 pre 1 2 3 4 CP 33 pre 1 2 3 4 CP 34 pre 1 2 3 4 CP 35 pre 1 2 3 4 CP 36 pre 1 2 3 4 CP 37 pre 1 2 3 4 CP 38 pre 1 2 3 4 CP 39 pre 1 2 3 4 CP 40 pre 1 2 3 4 CP 41 pre 1 2 3 4 CP 42 pre 1 2 3 4 CP 43 pre 1 2 3 4 CP 44 pre 1 2 3 4 CP 45 pre 1 2 3 4 CP 46 pre 1 2 3 4 CP 47 pre 1 2 3 4 CP 48 pre 1 2 3 4 CP 49 pre 1 2 3 4 CP 50 pre 1 2 3 4 CP 51 pre 1 2 3 4 CP 52 pre 1 2 3 4 CP 53 pre 1 2 3 4 CP 54 pre 1 2 3 4 CP 55 pre 1 2 3 4 CP 56 pre 1 2 3 4 CP pre 1 2 3 4 CP pre 1 2 3 4 CP pre 1 2 3 4 CP Figure 4. Inferred knower-levels of the next 28 children from Lee and Sarnecka (submitted). The blue bars come from normal Bayesian inference using MCMC. The green bars come from the excel sheet. The red bars come from the ad hoc heuristic. 11 Figure 5. A screenshot of the actual excel sheet, with some example data filled in. There is more room for data entry off to the right. 12 References Ansari, D., Donlan, C., Thomas, M. S. C., Ewing, S. A., Peen, T., & Karmiloff-Smith, A. (2003). What makes counting count? Verbal and visuo-spatial contributions to typical and atypical counting development. Journal of Experimental Child Psychology, 85, 50-62. Barner, D., Chow, K. & Yang, S. (2009). Finding one’s meaning: A test of the relation between quantifiers and integers in language development. Cognitive Psychology, 58(2). Carey, S. (2009). The origin of concepts. New York: Oxford University Press. Duncan, G. J., Dowsett, C. J., Claessens, A., Magnuson, K., Huston, A. C., Klebanov, P., Pagani, L. S., Feinstein, L., Engel, M., Brooks-Gunn, J., Sexton, H., Duckworth, K., & Japel, C. (2007). School readiness and later achievement. Developmental Psychology, 43(6), 1428-1446. Fuson, K. C. (1988). Children's counting and concepts of number. New York: SpringerVerlag. Le Corre, M., & Carey, S. (2007). One, two, three, four, nothing more: An investigation of the conceptual sources of the verbal counting principles. Cognition, 105, 395- 438. Lee, M.D., & Sarnecka, B.W. (2010). A model of knower-level behavior in numberconcept development. Cognitive Science, 34, 51-67. Lee, M.D., & Sarnecka, B.W. (submitted). Number knower-levels in young children: Insights from a Bayesian model. Sarnecka, B. W., & Gelman, S. A. (2004). Six does not just mean a lot: Preschoolers see number words as specific. Cognition, 92, 329-352. Sarnecka, B.W., Kamenskaya, V. G., Yamana, Y., Ogura, T., & Yudovina, J.B. (2007). From grammatical number to exact numbers: Early meanings of “one," “two,” and “three” in English, Russian, and Japanese. Cognitive Psychology, 55, 136-168. Sarnecka, B. W. & Lee, M. D. (2009). Levels of Number Knowledge in Early Childhood. Journal of Experimental Child Psychology, 103(3), 325-337. Wynn, K. (1990). Children’s understanding of counting. Cognition, 36(2), 155-193. Wynn, K. (1992). Children’s acquisition of number words and the counting system. Cognitive Psychology, 24(2), 220-251. 13