Steps in quantitative data analysis

advertisement
STEPS IN QUANTITATIVE DATA ANALYSIS
This is a broad outline of key steps in quantitative data analysis likely to be performed in M&E activities. It
aims at describing the line of reasoning that is pursued more than specific analysis tools. For each step in
quantitative data analysis, the following is presented:
Step
?


Description
What the step is and why you are doing it.
Actions and tools
What you will do and what you will use to do it.
Reporting
How you will say what you found. Clarity is very important: results should be written clearly, in plain
language, not in statistical jargon! In general, reporting should record and illustrate:
 How did we implement the step?
 What considerations guided us (e.g. assumptions)?
 What results did we obtain?
Formulate an hypothesis and select variables
?


A hypothesis is a statement about an expected relationship between two or more variables that
permits empirical testing. Formulating the hypothesis and then choosing the variables represent the
key conceptual stage of the research, since it defines the direction of the study. If you play enough
with a set of data you will find some sort of relationship, but the relationships that are meaningful to
you should be those defined in the hypothesis.
You must clarify:
 Why are we doing a study?
 What are the key variables?
 What relationship we can expect among them?
The first part of a study report should declare and clarify the hypothesis.
It is important to show how they are related to the data gathering and analysis. (What data were
collected? What analysis tools will be employed?)
Determine sample (randomly selected!)
Collect the data
Steps in quantitative data analysis - Page 1/4
Prepare the data
?


Data must be cleaned and organised for analysis. (Note that coding and nature of the data should
be thought through before the data gathering process starts and should pre-tested.)
Actions:
 Code/ input the data in the analysis software
 Check the data for errors and accuracy (Are all the responses reasonable? Are all relevant
questions answered? Are the responses complete?)
 Transform the data (e.g. collapse data into categories, handle missing values)
Tools:
Levels of measurement (i.e. nominal, ordinal, interval, ratio) determine what analysis tools can be
used for a variable.
Briefly describe the dataset on which you are operating, focusing only on unique aspects of the
study (e.g. what categories did you employ?).
Describing the sample
?


To put the data in context, describe it in terms of averages (e.g. average height) and variation (e.g.
the range of heights).
Employ descriptive statistics:
 Measures of central tendency, which indicate a typical or central figure for a group of members
(e.g. mean, mode, median)
 Measures of variance, which indicate the dispersion of the data, how scattered the data are
(e.g. standard deviation for continuos data/range for categorical data)
This stage is likely to produce extensive information. A report at this stage could read:
“There were x children in the study. The average [variable 1] was x, ranging from x1 to x2. The
average [variable 2] was z, with a standard deviation of z…
Use tables and graphs to summarise and clarify the most important information.
Comparing groups within the sample:
When proceeding from simply describing the data to making inferences on them, we enter the
realm of probability. We will have to accept that we are working on a sample and that we will
never be certain that this perfectly represents the reality. Could the results have arisen by chance?
To what extent are they really typical? Can we really generalise our conclusions?
 Our ability to pick up a real difference between groups (if such a difference exists) is determined
by the number of observations within each group (the sample size). The larger the sample size,
the more likely we are to pick up differences between groups if they actually exist.
 The amount of variation (e.g. the range of heights of boys and girls) is a factor. The less variation
within each group, the more likely we are to pick up differences between groups if they actually
exist.
As an example, if we accept a 95% CI there is 1 in 20 chances that a relation can be by chance.
Another way of looking at it is that if you ask 20 questions, one of those is probably not correct.
(THIS LAST BULLET IS NOT VERY CLEAR.).
Steps in quantitative data analysis - Page 2/4
Explore the differences between data
?


This means assessing whether the differences among the same variable in two different groups are
statistically relevant. For example, you found out that the values of the averages of a given variable
are different in two groups (e.g. study group and control group). Can you say that there is really a
difference among the two groups or could this difference have arisen by chance?
 Measures of significance, e.g. the t-test are used to find out if a difference is significant.
N/A
Explore the relationships within data (among pairs of variables)
?

You must understand what relations exist among different variables in your dataset and establish if
they are statistically significant, particularly between measures of programme operations and
measures of expected effects
 Formulate hypothesis on what relationships are likely to exist amongst your MAIN variable and
the others by using “null hypothesis”— assume that there is NO relationship between variable x
and z, then run tests to disprove this.
 Measure the strength of the relationship.
 Understand the likelihood that this relationship appeared by chance.
Example of weak
relationship

Example of strong relationship.
But what is the likelihood that it
appeared by chance?
Whether variables are nominal, ordinal or interval determines which analytical techniques are
appropriate for studying relationships
 For nominal variables, cross tabulations of the data are used (e.g. chi-square test).
 Interval variables use correlation tests.
You will have to document the relationships relevant for your study, i.e.:
 Those that involve the main variable being analysed
 Those that are likely not to have appeared by chance and are strong enough to be significant.
When writing reports on correlation, don’t limit yourself to the statistical jargon: e.g. “we did a chisquare test and it revealed a p of x." You are not simply looking for the results of a test! Instead,
clarify the hypothesis, clarify the effect of the relationship of the data, the strength of the relation,
the likelihood that the relationship did not appear by chance. For example, while exploring the
relationship within BMI and n. of siblings you could say: “children with more than one sibling had a
lower BMI. Children with no siblings had a BMI of x. The relationship was distinguishably different at
the 95% confidence interval."
Finding a statistical relationship among variables does not always imply that one caused the
other. The relation could be true, but could just appear by accident. Causality is not something
that can be revealed only by statistical tests, but is a logic process that builds on the statistical
findings (i.e.: the existence of a relationship).
Explore models built on relevant variables
?
Devise and test explanatory models.
Steps in quantitative data analysis - Page 3/4


Actions:
Considering all the relationships you discovered, choose the set of key variables that you think are
most likely to influence the main variable. Test them to understand if and to what degree they can
explain the variance of the main variable of your study. The models could be applied to different
subgroups of the main variables (e.g. male and female population).
In building up your case you could, for example:
 Choose variables because you discovered that they are strongly correlated with the main variable
 Choose variables that are not so strongly correlated, but still you deem important for your model
 Discard variables that are strongly correlated to your main variable, but are on the same causal
pathway, therefore would not add relevant information
 Discard variables that are apparently related but could lead you to wrong conclusions.
Those models are based on your judgement and should be justified.
Note that multivariate techniques can be very powerful analytical tools, but they must be used with
great care. They are all based on numerous assumptions, some of which will not be met. As a
result, apparent findings often are not valid. A plan for data analysis should not include any
multivariate techniques unless the evaluation team -- manager and consultants -- are already wellacquainted with them or can call on the assistance of someone who knows how to use them.
Tools:
Regression analysis/Multivariate analysis
Explain:
 why you did/did not choose variables
 the rationale for the model
 the finding from statistical tests
Organise and present the data (see Overview – managing data analysis)
Validate/discuss with key stakeholders (see Overview – managing data analysis)
Steps in quantitative data analysis - Page 4/4
Download