DATA ANALYSIS AND REPORTING ON INVESTIGATIONS To read up on data analysis and reporting on investigations, refer to pages 720–745 of Eysenck’s A2 Level Psychology. Ask yourself What are descriptive statistics? Why do we test the probability of any given set of data being due to chance? Which two criteria determine which statistical test should be used? What you need to know DESCRIPTIVE STATISTICS AND STATISTICAL TESTS QUALITATIVE ANALYSIS THE CONVENTIONS FOR THE REPORTING OF PSYCHOLOGICAL RESEARCH Levels of The process of measurement and qualitative analysis measures of central Interpretation and tendency and evaluation of dispersion qualitative analysis Appropriate use of graphs, charts, and tables Justification of statistical tests Probability and significance including Type 1 and Type 2 errors Inferential tests: chi-squared, Mann– Whitney, Wilcoxon, Spearman’s rho DESCRIPTIVE STATISTICS AND STATISTICAL TESTS The structure of journal reports Title, abstract, Introduction, aims, and hypothesis, method, results, discussion, references, appendices For the exam, you need to be able to know how to select and interpret statistical tests. There are just four tests you need to cover: Mann–Whitney, Wilcoxon, chi-squared, and Spearman’s rho tests. You will not be asked to carry out a statistical analysis during the exam but you must understand which test would be used and when, and the nature of the data each test generates. Descriptive statistics give us convenient and easily understood summaries of what we have found. They give an indication of what statistical analysis is likely to reveal. Descriptive statistics include: graphs, tables, measures of central tendency, and measures of dispersion. The appropriate descriptive statistics depend on the level of measurement, and so we will consider this first. Levels of measurement The levels of measurement progress in terms of the sophistication of the data from nominal (the least sophisticated data) to ratio (the most sophisticated data). Nominal: the data consist of the numbers of participants falling into various categories (e.g. fat, thin; men, women). Ordinal: the data can be placed in rank order, i.e. they can be ordered from lowest to highest (e.g. the finishing positions of athletes in a race or rating scales). Interval: at this level the data has fixed intervals and so differ from ordinal data because the units of measurement are fixed throughout the range. For example, there is the same “distance” between a height of 1.82 metres and 1.70 metres as between a height of 1.70 metres and one of 1.58 metres. Ratio: the data have the same characteristics as interval data, except that they have a meaningful zero point, i.e. an absolute zero. For example, time measurements provide ratio data because the notion of zero time is meaningful, and 10 seconds is twice as long as 5 seconds. The similarities between interval and ratio data are so great that they are sometimes combined and referred to as interval/ratio data. Measures of central tendency and dispersion The measures of central tendency—the mode, the median, and the mean—are averages, and so involve the calculation of a single number that is representative of the other numbers with which it is associated. The appropriate measure to use depends on the level of the data. The mode is the number that occurs most frequently. It is quick and easy to calculate and can be used whatever the level of data. Thus, it can be used for the least sophisticated nominal data. However, it provides very limited information because it does not tell us about the other values in the score distribution. A further problem is that it is possible to have more than one modal value; two modal values are known as bimodal values, and more than two are called multimodal. The median is the middle value when the scores are arranged from lowest to highest. Advantages of the median are that it can be used when you are unsure about the reliability of extreme or anomalous values, thus when you have skewed distributions. The fact that it is based on ordering the data means that it can be used only for data that are ordinal and above The mean is the arithmetic average; it is calculated by adding all the values together and dividing the total by the number of values. The mean is the best measure of central tendency to use because it makes use of all the data in the score distribution. However, it should be used only with data that form a normal distribution (remember: the bell-shaped curve) and for data of interval or ratio measurement. The mean should not be used when there are extreme outlying values (anomalies) because, as it uses all of the data, it is easily distorted; when there are outliers, the median should be used, not the mean. The measures of dispersion measure the variability within the data distribution, i.e. are the scores similar to or different from each other? Thus, they are a measure of the spread of the scores in the data distribution. The variation ratio complements the mode because it is the proportion of non-modal scores and so is suitable for nominal data. Advantages include the fact that it is easy to calculate and can be used on unsophisticated data, i.e. nominal data. The limitation is the same key disadvantage as with the mode: it is not representative of all scores in the distribution and so tends not to be used. The range is the difference between the highest and lowest scores in a dataset; the key advantage is that it is easy to calculate. Limitations include the fact that the two most extreme values are used to calculate the range and so if these are outlying the calculated range will not be representative of the distribution. Also, the range does not make use of all the data in the score distribution and can only be used for data that is ordinal and above. The interquartile range solves the problem of outlying values by using only the middle 50% of scores in the calculation. This gives a better idea of the distribution of values around the centre. The standard deviation is a measure of variability. It measures scores in terms of difference from the mean. Advantages are that, as with the mean, the standard deviation uses all the scores in a set of data and so is the best measure of dispersion to use. We can make inferences based on the relationship between the standard deviation and a normal distribution curve. However, limitations include the fact that data need to be of interval or ratio levels of measurement, and to be approximately normally distributed, which is not always possible because this is the most sophisticated type of data. Graphs and tables Graphs and charts present the data visually. They are a useful way of summarising information as the data is easily accessible in visual format. Bar charts Bar charts illustrate data measured at nominal or ordinal level. They are used for non-continuous variables because the bars are separate from each other. This is in comparison to the histogram where bars are adjoined. Bar charts are often used to illustrate the means from different conditions. Histograms Histograms are used to present frequencies of continuous data. Thus, the data must be measured at interval or ratio level for the histogram to be appropriate. The histogram represents the same information as the frequency polygon, except one is presented as bars and the other as a line graph. Frequency polygons Frequency polygons show the frequencies of continuous data, i.e. data that achieve at interval or ratio level. They are useful when representing results from two conditions at the same time. The x-axis shows the scores and the y-axis shows the frequency. Scattergraphs Scattergraphs (or scattergrams) are used to present correlated data. It does not matter which variable goes on which axis. Correlations range from perfect positive (+1) to no correlation to perfect negative (-1). The sign indicates the direction, and the correlation coefficient (the number) indicates the strength of the correlation. The closer the correlation coefficient is to 1, the stronger the correlation. If scores are positively correlated, they increase and decrease together; if they are negatively correlated, then as the scores on one variable increase, the scores on the other variable decrease. Perfect positive and negative correlations are rare in psychological research; imperfect correlations are more common. You must be able to interpret the direction and strength of a correlation, so remember, as a rule of thumb: low numbers (0.1 to 0.3) are weak correlations 0.4 to 0.6 are moderate 0.7 to 1 are strong (although it is a little more complex than this because strength depends on the size of the sample). Tables A table can be an effective way of summarising a large amount of data, for example, measures of central tendency and dispersion can be provided in the one table. Tables can provide information very simply and clearly but can be harder to interpret than a graph because it is more difficult to visualise the data. Statistical tests Test of difference, association, or correlation? To select the appropriate statistical test you need to decide if a test of difference or a test of association or correlation is appropriate. One-tailed or two-tailed test? When using a statistical test, you need to take account of the alternative hypothesis. If you predicted the direction of any effects (e.g. loud noise will disrupt learning and memory), then you have a directional hypothesis, and so need a one-tailed test. If you did not predict the direction of any effects (e.g. loud noise will affect learning and memory), then you have a non-directional hypothesis, and so need a two-tailed test. Level of precision Another factor to consider when deciding which statistical test to use is the type of data you have obtained. There are four types of data, of increasing levels of precision (nominal; ordinal; interval; ratio, see above). Statistical significance The statistical test is a way of testing the probability that the results are due to chance. Probability in psychology is used to determine if the probability of our results being due to chance is low enough for the alternative/experimental/correlational hypothesis to be accepted. If so, the results are significant and consequently the null hypothesis can be rejected and the alternative/experimental/correlational hypothesis can be accepted. The probability of the findings being due to chance is estimated from the level of statistical significance achieved by the data. The conventional minimum level of significance to be accepted is p < 0.05 (which is also known as the 5% level); this means that if this level of significance is achieved, the probability of the results being due to chance (i.e. a fluke) is less than 5%. Thus, the null hypothesis is rejected (and the alternative hypothesis is accepted) if the probability that the results were due to chance alone is 5% or less. This is often expressed as p = 0.05, where p = the probability of the result if the null hypothesis is true. If the statistical test indicates that the findings do not reach the 5% (i.e. the p = 0.05) level of statistical significance, then we retain the null hypothesis and reject the alternative hypothesis. The data sometimes indicate that the null hypothesis can be rejected with greater confidence, say, at the 1% (i.e. one out of one hundred, shown as 0.01) level. If the null hypothesis can be rejected at the 1% level, it is customary to state that the findings are “highly significant”. In general terms, you should state the precise level of statistical significance of your findings, whether it is the 5% level, the 1% level, or whatever. There are two kinds of error that can occur when reaching a conclusion on the basis of the results of a statistical test: Type 1 error: the null hypothesis is rejected incorrectly as the results are actually due to chance. Type 2 error: the alternative hypothesis is rejected incorrectly as the results do in fact show a real difference or relationship. It would be possible to reduce the likelihood of a type 1 error by using a more stringent level of significance. For example, if we used the 1% (p = 0.01) level of significance, this would greatly reduce the probability of a type 1 error. However, use of a more stringent level of significance increases the probability of a type 2 error. Consequently, most psychologists favour the 5% (or (p = 0.05) level of significance: it allows the probabilities of both type 1 and type 2 errors to remain reasonably low. Choosing the appropriate statistical test To summarise, two key factors determine choice of statistical test: 1) Research design: correlation or experimental determines if a test of association or a test of difference is needed. If the design is experimental then choice of test depends on if the independent measures or repeated measures design has been used, as independent measures requires a test of unrelated data whereas repeated measures requires a test of related data. 2) Level of data: see above. Note that other considerations include whether the hypothesis is directional or non-directional and the level of significance needed to avoid both Type 1 and type 2 errors. These do not affect choice of test. They are used when looking up the critical value but are not relevant to choice of test. Correlational Difference test test Level of Independent data Related data (obtained from repeated measures measurement and matched pairs designs) Nominal Chi-squared test Sign test Chi-squared test Ordinal and Mann – Whitney U Wilcoxon matched pairs signed ranks Spearman’s rho interval test Note that this table deals only with the statistical tests described in the text, although other tests do exist. Thus, you need to justify the statistical test based on the design and level of data. For example, a test of gender difference in reported stress uses an independent measures design and the data is ordinal level of measurement. (The fact the data test a difference, and that the data is unrelated because the design is independent measures and achieves ordinal level of measurement, means that the Mann–Whitney U test is appropriate.) Statistical tests Please refer to A2 Level Psychology pages 729–737 for worked calculations. Remember you will not be asked to calculate any of these in the exam but working through the calculations will give you a better understanding of the statistical tests. The chi-squared test The chi-squared test is a test of association and also a test of difference. It is used when we have nominal data in the form of frequencies, and when each and every observation is independent of all the other observations, and so when an independent measures design has been used. Mann–Whitney U test The Mann–Whitney U test can be used when an independent design has been used and the data are either ordinal or interval. Wilcoxon matched pairs signed ranks test The Wilcoxon matched pairs signed ranks test can be used when a repeated measures or matched participants design has been used and the data are at least ordinal. This test can also be used if the data are interval/ratio. Spearman’s rho Suppose that we have scores on two variables from each of our participants, and we want to see whether there is an association, or correlation, between the two sets of scores. Providing the data are at least ordinal, this can be done using the test known as Spearman’s rho. Spearman’s rho or rs indicates the strength of the association. rs is +1.0 = a perfect positive correlation between the two variables. rs is –1.0 = a perfect negative correlation between the two variables. rs is 0.0 = no relationship between the two variables. Calculating the observed value The next step is to perform the calculations. The outcome of a statistical test is a number, called the observed value. The worked examples in the boxes in A2 Level Psychology pages 729–737 show how to calculate each statistical test. Using a table of significance to compare the observed and critical values To establish significance, your calculated (or observed) value must be compared with a critical value. These can be found in critical value tables in Appendix A of Eysenck’s A2 Level Psychology. The table will tell you if the calculated value has to be less than or more than the critical value for significance to be achieved. For the purposes of the exam, you need to know if the observed or calculated value needs to be lower or higher than the critical value. This differs, depending on which statistical test is used. Comparing the calculated and critical values Statistical test Chi-squared test Mann– Whitney U Wicoxon signed ranks Spearman’s rho Calculated compared to critical value Calculated value must be less than or equal to the critical value Calculated value must be less than or equal to the critical value Calculated value must be greater than or equal to the critical value Calculated value must be greater than or equal to the critical value Reporting the result The final step is to record the outcome of this whole process. You should include the following information in a statement of significance: details of the level of significance the critical and observed values degrees of freedom, number of participants or number of paired scores whether the hypothesis was directional or non-directional (one tailed or two-tailed) whether it was accepted or rejected. See Eysenck’s A2 Level Psychology page 739 for example statements of significance. QUALITATIVE ANALYSIS Qualitative data can take many forms and consists of words: written records, e.g. notes or transcripts audio or video recordings direct quotations from participants. The process of qualitative analysis 1. Data are gathered using non-experimental methods, which include naturalistic observation, interview, questionnaire, and case study. 2. The data need to be categorised and these categories should be suggested by participants to avoid researcher bias. 3. The researcher will look for recurrent themes and patterns in the data, which might or might not fit with the previously constructed categories. For example, discourse analysis would be interpreted by analysing the meanings behind the words used. 4. Consider the research hypothesis and how this might have changed as a result of the investigation. 5. It might be useful to make the qualitative data quantitative, e.g. content analysis. The researcher might quantify the data by counting the number of items that fall into each category. This can be done to summarise the qualitative data and usually accompanies, rather than replaces, the more indepth qualitative analysis. Evaluation of qualitative analysis Advantages Weaknesses In-depth data increases validity (truth) Lacks generalisability Explanatory not just descriptive Subjective and open to bias Difficult to replicate as lacks reliability THE CONVENTIONS FOR THE REPORTING OF PSYCHOLOGICAL RESEARCH The research journals that psychological research is reported in follow a conventional structure, as detailed below. Title This should be very specific, including the research design and the variables. Abstract This is a single paragraph that summarises the main points of the research study: aims, a brief description of the background research, methods, findings, and conclusions. Introduction, aims, and hypothesis The introduction should summarise relevant background research; it should begin at a general level and quickly narrow down to examine two or three particularly relevant pieces of research. The aims and hypothesis must be a logical progression from the background research. The hypothesis must be testable and so needs to be operationalised, which means how the variables were measured must be clear within the hypothesis. Method This section of the report should provide the reader with sufficient detail to replicate the study. It is typically subdivided into the four sections: 1. Design: includes design decisions, such as choice of method (e.g. experiment or observation), experimental design, and the key variables. Any controls of confounding variables, sources of bias, and ethical decisions that were taken as part of the design should also be included. 2. Participants: where, when, who, how (sample method) need to be detailed. 3. Apparatus/materials: full details of all materials should be placed in the appendix section of a report, so just a description of the materials is included in the write-up. This might include a description of questionnaire construction, observation criteria, standardisation of a test, etc. 4. Standardised procedures: this should be a clear but detailed summary of exactly how the study was implemented. Results There are three ways to illustrate the results of psychological research: 1. Raw data: the numbers prior to any analysis. These should be placed in the appendices but a summary might be included in the results section. 2. Descriptive statistics, such as the use of measures of central tendency (mean, median, and mode) and/or spread (range or standard deviation), plus graphical representation. 3. Statistical tests: determine whether the findings are significant. This section must state clearly which test was used, justify the choice of statistical test, record details of the test calculations in the appendix, and state the outcome of the statistical test and so which hypotheses are supported or rejected. Discussion Four areas need to be covered: 1. Explanation of findings: the findings need to be related to the original aims and hypotheses. Any unanticipated findings can be discussed. 2. Relationship to background research: The findings must be related to the research in the introduction in terms of whether they support or contradict it. 3. Limitations and modifications: key limitations need to be considered, and improvements to resolve the issues. 4. Implications and suggestions: implications include the practical use of the research, so how the findings can be used to explain real-life behaviour. The implications should lead logically into suggestions for future research. References The reason for full references is so that readers have the details of the original article or book if they wish to research the study/theory further themselves. This is the style used in the reference section of Eysenck’s A2 Level Psychology. An alternative acceptable style is to state the details of a textbook, and list all the studies with page numbers that have been cited from this book. This means that anyone who would like to follow-up one of your references can locate the exact reference and the article. Appendices Examples of materials and/or questionnaires, standardised instructions, raw data, and statistical tests are included in the appendices. So what does this mean? In this section we have considered how data should be analysed including at a descriptive and inferential level. Statistical tests need to be understood but fortunately not calculated in the exam. Make sure you can justify the different statistical tests and know for each test whether the calculated value has to be greater or lower than the critical value. Finally, make sure you are familiar with how psychological research needs to be written up. Over to you Please see the Psychological research and scientific methods: specimen question in chapter 1 of Eysenck’s A2 Level Psychology (see A2 Level Psychology page 6) for a typical question on this topic.