SSE5210 | ST | MID-TERM TEST | TOPIC COVER QUESTION - 2 Q1: - Based on situation, identify Variables Selection – independent & dependent variables (response variables) An independent variable is the variable that is changed or controlled in a scientific experiment to test the effects on the dependent variable. A dependent variable is the variable being tested and measured in a scientific experiment. - Based on variables - need to know the scale of variables – Nominal - Only maps the attribute of entity into a name or symbol. Example: classification, labeling and defect typing – Name / Symbol Ordinal - Consists of classes that are ordered with respect to the attribute. Example of ordering criteria: “greater than”, “better than”, “more complex”. Example: grades, software complexity Interval - Preserve order same as the ordinal scale but there is notion of “relative distance” between two entities. Example: temperature measure in Celsius and Fahrenheit Ratio - Preserves ordering, size of intervals between entities, and ratio between entities. Start at zero and increase at equal intervals, known as units. Other example: Length of software code, duration of the development phase - Suggest the type / design of Experiment – Randomization - Statistical independence (analysis requirement), Random allocation of subjects, objects & in which order test are performed. Select objects that is representative of the population of interest. Blocking - Divide into separate blocks by partitioning factors of non-interest. systematically eliminate the undesired effect in the comparison among the treatment. Within one block, the undesired effect is the same and we can study the effect of the treatments on that block. The persons (subjects) used, for this experiment, have different experience. Some of them have used object-oriented design before and some have not. To minimize the effect of the experience, the persons are grouped into two groups (blocks) Balancing - Each treatment has same number of subjects. Balancing is desirable – simplifies and strengthens the statistical analysis of the data, but it is not necessary. The experiment uses a balanced design, which means that there is the same number of persons in each group (block) - Suggest example Statistical Analysis Method / test statistic can be use 1 Factor / 2 Treatments - To investigate if a new design method produces software with higher quality than the previously used design method. 1 Factor / >2 Treatments - The experiment investigates the quality of the software when using three different programming language. 2 Factors / 2 Treatments each > 2 Factors / 2 Treatments each - Formulate the Hypothesis o H0:μ1=μ2 o H1:μ1≠μ2,μ1<μ2,μ1>μ2 - Validity Threat – to generalize the result Q2: - Statistical Analysis Descriptive / Numerical and graphical Given small data – to Calculate Mean, Median & Mode, describe Stem and Leaf Plots Box Plot What is your opinion about current status of empirical software engineering especially replication? One of the main concerns in any experimental or empirical research work is to provide evidence that the conclusions we infer from the quantitative data obtained in experimentation are valid and of practical use. The goal of any statistical procedure in empirical research is to seek to exclude chance from the conclusions that will be attributed to the substantial hypotheses on which we base our scientific claims. The approach that is usually followed in empirical software engineering experiments takes the form of testing, by some statistical procedure, the null difference between the means of two populations: lT, the mean of the new treatment, and lR, the mean of reference. This is called ‘‘Null Hypothesis Significance Test’’ Research strategies • survey • case study • experiment • action research α “alpha” = significance level in hypothesis test, or acceptable probability of a Type I error (probability you can live with). 1−α = confidence level. β “beta” = in a hypothesis test, the acceptable probability of a Type II error; 1−β is called the power of the test. μ mu, pronounced “mew” = mean of a population. ν nu: see df, above. ρ rho, pronounced “roe” = linear correlation coefficient of a population. σ “sigma” = standard deviation of a population. σx̅ “sigma-sub-x-bar”; see SEM above. σp̂ “sigma-sub-p-hat”; see SEP above. ∑ “sigma” = summation. (This is upper-case sigma. Lower-case sigma, σ, means standard deviation of a population) χ² “chi-squared” = distribution for multinomial experiments and contingency tables.