Exercise on Step 1, The Modeling Step Recall Forum F2, where we "completely identified" the random variable of interest as explained in Types of Variables. Therein we identified one qualitative and two quantitative random variables of interest. That was assigned because it is the first part of the first step of statistical inference. Indeed (IMHO), the first step of scientific research in general is to identify the variables you wish to observe through sampling or experimentation. Objectives The objectives of this exercise is to introduce the following concepts. The first step of statistical inference (estimation and significance testing) is to formulate a model for the proposed (observational or experimental) study. All studies include observation of random variables. We use what we know, the sample (data), to make inferences about what is unknown, the parameters of the population (or experimental phenomena). Choice of the optimal inferential procedure depends on the parameters of interest and the assumptions we believe are valid for the study at hand. The assumptions are assumptions about the underlying distributions of the random variables of interest. Instructions For F4, each of us will perform what I call Step 1, the Modeling Step, of my 6-Step Method of Statistical Inference, as summarized briefly in Statistical Inference 6-Step Overview, and as explained fully in 6-Step Method of Estimation. The key to, assumptions of, and formulas for, the four models we will entertain are in Univariate Formulas Tabulated. Document1 1 2/8/2016 We will perform Step 1 three times: 1. Step 1 for estimation of the (population) proportion of the qualitative random variables of interest we identified in F2 Example a. RV of interest Let Xi denote the sex (male, female) of the i-th randomly sampled Stage 4 lung cancer patient in southwest Virginia, i = 1, 2, ..., 100 b. Parameter of interest Let φ denote the proportion of females among all Stage 4 lung cancer patient in southwest Virginia. c. Assumptions (about the underlying distribution) None 2. Step 1 for estimation of the (population) mean of the continuous quantitative random variables of interest we identified in F2, assuming that the underlying distribution is the Normal distribution Example a. RV of interest Let Yi denote the age (yr) of the i-th randomly sampled Stage 4 lung cancer patient in southwest Virginia, i = 1, 2, ..., 100 b. Parameter of interest Let 𝜇 denote the mean age (yr) of all Stage 4 lung cancer patient in southwest Virginia. c. Assumptions (about the underlying distribution) We assume that Yi is Normally distributed Document1 2 2/8/2016 3. Step 1 for estimation of the (population) standard deviation of the continuous quantitative random variables of interest we identified in F2, assuming that the underlying distribution is the Normal distribution Example d. RV of interest Let Yi denote the age (yr) of the i-th randomly sampled Stage 4 lung cancer patient in southwest Virginia, i = 1, 2, ..., 100 e. Parameter of interest Let 𝜎 denote the standard deviation of age (yr) of all Stage 4 lung cancer patient in southwest Virginia. f. Assumptions (about the underlying distribution) We assume that Yi is Normally distributed. Note well The modeling step, being the interface between the biology (psychology, engineering, whatever subject matter) and statistics (a.k.a. analytics) is the most challenging step for both statisticians and non-statisticians. The modeling step involves imagination and creativity. Once the model is formulates, the statistical paradigm is framed, and subsequent steps are nearly automatic and, for the most part, computerized. In first-semester introductory courses, the models are the simplest models, involving only one or two variables. In subsequent courses the models get much more elaborate, employing multiple variables and equations for relationships among the variables. Document1 3 2/8/2016