THE PRESENTATION AND ANALYSIS OF EMPERICAL AND QUALITATIVE DATA C. Njerekai (Department of Tourism and Hospitality Management) Midlands State University, Zimbabwe, Gweru Paper presented at Chinhoyi University of Technology on a one-day research workshop held on the 17th of November, 2010 1. DATA ANALYSIS 1.1 INTRODUCTION The analysis of data is carried out to decipher meaning from the data and draw conclusions that address the research questions and objectives. Data analysis will allow the researcher to accept or reject hypotheses and to make recommendations with respect to the research problem. 1.2 STEPS FOR DATA ANALYSIS Step1: Edit the data This process is undertaken to ensure that the data gathered is relevant, accurate consistent, complete and usable. Wherever possible, editing should be done in the field, during and/or soon after the completion of the survey. The following points need to be noted when editing data; Where responses are inconsistent, discard the inconsistent answers. As an example, a respondent who indicates that their highest level of education is ordinary level cannot be a college professor by profession. In this case, the profession is inconsistent with the level of education and hence it should be discarded. In the case of missing and unclear responses, the respondents can be re-contacted to fill it in. However, missing insignificant data should be discarded. You must also check out for responses that are too consistent throughout the questionnaire and validate these. Editing can also be done manually and electronically after you have returned from the field. Electronically, the computer can check for inconsistent responses or coded values that lie outside the permissible range. If you decide to employ research assistants, you must ensure that the necessary field controls are in place to minimize error during the actual data collection. More precisely, you need to monitor and validate the data collection process. Monitoring involves ensuring that the data gathering proceedings are conducted as required. As an example, you can listen to some of the interviews between the respondent and the research assistant. Validation involves checking the accuracy of the data collected by the 1 research assistants. In this case, you can contact a section of the sampled study subjects to confirm the validity of the data and to confirm that data gathering actually took place. Step 2: Code the data Coding refers the process of assigning numerical values to all response categories in questions. It involves classifying each response in a questionnaire with a designated numerical symbol. Coding is necessary for the transfer of data into computers for analysis involving statistical software packages such as SPSS. The coding needs to be done for both qualitative and quantitative data as there are also qualitative and quantitative computer assisted data analysis techniques. Coding can be done before the questionnaire is administered (pre-coding) or after administration (post-coding). Pre-coding is suitable for structured or closed questionnaires while post-coding is mostly suitable for unstructured or open-ended questions. The process of coding requires establishment of categories. These categories should be; Mutually exclusive Convenient in number Exhaustive viz; include possible responses like don’t know, not applicable, other etc. Step 3: Analyse Qualitative Data Step 3.1: Transcribe or capture the data This is the process of inputting data into the computer. It involves transferring the data from the data collection instruments into a manual or computer readable format. Although this process can also be done manually as will be discussed later, in today’s computerized world, data analysis is almost always computer based. Given the range and variety of the computerised data analysis techniques, the only way to familiarise yourself with these packages is to practise using them using dummy or actual data collected from the field. Step 3.2: Analyse the qualitative data There is great variety and diversity in approaches to the analysis of qualitative data. However, despite this diversity, common threads that run through this variety have been identified to the extent that, a few qualitative methodological techniques have been crafted to analyse responses from qualitative research questions or studies. These techniques are discussed next; Qualitative data analysis techniques (a) Computer assisted qualitative data analysis software 2 Several software packages have been developed specifically to analyse qualitative data. While such packages may save time they can also distance the researcher from the data and there really can be no substitute for the complex processes of the reading and interpretation [powers of the human mind. Examples of CAQDAS include QSR*Nvivo 7.0, NUD*IST etc. These software packages offer well set out workspaces from which you can: Note themes, patterns and connections emanating from the responses to each question Have an overview of the responses Construct flow diagrams, flow charts, networks etc, which can further bring out typologies and establish relationships between the responses and Draw, evaluate and verify conclusions. (b) Use of quasi-statistics This involves quantifying the quantifiable qualitative responses. (c) Use of topical question list or hermeneutics Hermeneutics is the study of interpretation theory. The topical questions, which can be used to assist in this process include ‘why’, ‘what’, ‘how’, ‘when’ and ‘where’. (d) Semiotics. This refers to the study of signs and their meaning, e.g logos, brands etc. (e) Taxonomy. This refers to the practice and science of classification. In this case one is supposed to code and classify the data. (f) Micro-analysis. This involves the analysis of events as captured in videos and films. This is done in order to identify critical incidents. (g) Comparative analysis. This relates to the comparison of current qualitative data with trends in literature and experience. (h) Reconnaissance. This involves familiarization yourself with the data. (i) Reduce, reduce and reduce to reduce clutter (j) Use of mind maps, flow charts and diagrams to visualize the data (k) Hunt for connections and develop theory from the data 3 Step 4: Analyse Quantitative Data 4.1 Transcribe or capture the data This item has already been discussed under item 3.1 of this apaper. 4.2 Undertake Preliminary data analysis It is advisable to start with simple analysis of data before subjecting it to more complex and sophisticated data analysis. The preliminary data analysis that can be employed in this endeavour are as follows; (a) Simple one-way tabulation Simple one-way tabulation involves counting the responses in each category and hence creating a basic data array. In other words, simple one-way tabulation involves assigning a value to each variable in the array or criteria relating to the sampling unit(s). After this, frequency distributions are created from the data array. Such frequency distributions help you to draw meaningful conclusions from the data. Table 1 is an example of a simple one-way tabulation and the associated frequency distribution. Table 1: A simple one-way tabulation QUALIFICATIONS NUMBER OF EMPLOYEES O-Level 5 A-Level 3 Certificate 1 Diploma/Higher 4 national Diploma Degreed/Post graduate 1 (b) Cross-tabulation of data This involves the construction of data arrays in such a way that the respondent with a particular response to one variable can be traced, to check how they responded to other variables. Such tabulations emphasize and bring out the relationship between variables. 4 As an example, one can cross-tabulate the ages of respondents and the recreational activities they undertake. Cross tabulation can produce new variables. c) Statistical summarization of data Measures of central tendency and dispersion are the main statistical techniques you can use to summarize and describe your data hence the term descriptive statistics. The measures of central tendency aim at reducing your large volumes of research data to a single measure, which characterizes the entire set of data. These measures are summarized in table 2. Table 2: Summary of measures of central tendency METHOD PURPOSE/USEFULNESS i. Mean Used as a descriptive statistic to indicate the typical or central figure in a group of items, figures or numbers. The mean can be misleading where the figures are clustered around extremes. ii. Mode This refers to the most frequently occurring value in a distribution. In layman’s language, this can be referred to as the most fashionable or usual figure. This figure is however, not necessarily the mean or the median. However, in a normal distribution, it is possible for the mean, the mode and the median to be of the same value. (iii) Median This refers to the middle item in a distribution when the values are arranged in ascending or descending order. It is more useful when the data has extreme values or outliers (i) Range This reveals the difference between the highest and the lowest values in a distribution (ii) Inter-quartile range and This refers to the difference between the upper and semi-inter-quartile range lower quartiles of the data. (iii) Mean deviation This refers to the average of the absolute difference of each value in the distribution from the mean. (iv) Standard deviation This refers to the dispersion of the items in the distribution around the mean. Where this figure is high there is more homogeneity or uniformity in the data and the opposite is true. (v) Variance This refers to the square of the standard deviation. This measure is rarely used in social science research For thorough procedures on the derivation of the measures of central tendency and dispersion in table 2, please refer to the numerous textbooks on statistics. 5 4.3 Undertake more complex analysis of the data This is mainly achieved through subjecting the data to statistical data analysis techniques. Such statistical data analysis techniques are undertaken to help you make intelligent decisions and conclusions pertaining to the research problem. Measures of association and inferential statistics are the main techniques used in this endeavor. These will be discussed in more detail below; i) Measures of association These measures double up as descriptive and inferential statistics. They relate to the relationship between two or more sets of data. This relationship is most vividly shown through the use of scatter graphs, which clearly bring out the nature and pattern of associations. The positions of the correlation coefficients between the variables can be summarized as follows; (a) When variables are in perfect association i.e. an increase in one causes a proportional increase in another, then there is a perfect positive correlation. In this case the correlation coefficient is +1. (b) When there is no correlation between variables, the correlation coefficient is 0. c) Correlation coefficients between two 0 and +1 can be described using descriptive words such as ‘strong descriptive words such as ‘strong’ correlation or weak correlation. (d) When variables are in a perfect negative correlation, an increase in one causes a proportional decrease in the other. NB* Please bear in mind, that correlation does not always establish causation even though the causation may be present. ii) Inferential statistics Inferential statistics are used to draw inferences, informal guesses or conclusions on whether certain situations are by chance or are reflective of the true situation. However the decision on the statistical tests you will use is taken at the start of the data collection process. This is because; certain statistical tests only work with certain data collection techniques and the resultant types of data collected e.g. nominal, ordinal, interval data e.t.c. The sample size and the population from which you collect the sample can also influence the type of statistical tests to be used. At this point, we would like to reiterate that, a detailed outline of how the various statistical values are computed will not be given in this book. Rather, situations and examples where these statistical tests can be carried out will be given. You are therefore advised to visit the numerous textbooks on statistics and research methods for a step-by-step derivation of these statistical values. There are basically two types of tests that you can apply to tourism research viz parametric and non-parametric tests. Parametric tests are tests that are used to establish 6 the correlation between continuous random variables. These tests require some assumptions of the population before application. As an example, one has to assume that the data is normally distributed. These tests only use interval and ratio data. Nonparametric tests make no assumptions about the population. These tests work with ordinal and nominal data. The various types of parametric and non-parametric tests are summarized in table 3. Table 3: Summary of parametric and non-parametric tests PARAMETRIC TESTS The student’s t- test This test is used to compare means from small samples each usually less than 30. The null hypothesis states that there are no significant differences between the means. There are various types of t- tests depending on whether one is using paired or unpaired sample means. NON-PARAMETRIC TESTS The Chi-square test This test helps to compare the differences between the observed and the expected frequencies. It therefore deals with grouped or categorical data. Its values reveal how close the observed frequencies of the categories are to the corresponding expected frequencies of the same categories. Chi-square tests use nominal data. Analysis of variance (ANOVA) This test is used when one wants to compare two or more sample means to measure significant differences. Related t- test This test compares the actual size of the differences between matched scores in each condition or time. To employ the test, one has to assume that, the data was taken from a random sample, that the samples were taken independent of each other and that the population was normally distributed. The sign test This test compares the differences of pairs of nominal scores from repeated measures. In such cases, the null hypothesis states that there is no difference between the scores. The alternative hypothesis states that there is a difference. Wilcoxon signed rank test This test uses ordinal data from pairs of scores from repeated measures. The same null and alternative hypotheses as in the sign test are used. The main difference with the sign test is that the differences in the scores measured are placed in rank order. Pearson’s product moment correlation coefficient This test determines the degree to which two variables vary together. It is applicable to ratio data drawn from a normal distribution. Regression analysis This is an advanced form of regression analysis, which can be used to predict values of one variable, knowing the other. With regression analysis one can estimate the causal and functional relationships between two variables when only one of them is known. Mann Whitney U-test This test is used to test whether ordinal data collected from two different samples 7 taken independently of each other and at random also differ significantly. In this case, the scores in each sample are compared by ranking the individual scores and determining whether the ranks are evenly distributed or not. If the distribution of the ranks is lop-sided, real differences would be existing between them Spearman’s rank order correlation coefficient This is used to determine the degree of association between two sets of ranked data. The test is more applicable to ordinal data from randomly selected samples. Just like the other correlation coefficients discussed under item 3.5 of this booklet, its value ranges between +1 and –1 and the interpretation of these values is as explained in figure 6. (iii) Computer assisted data analysis and presentation In this case, the computer makes intelligent decisions and conclusions pertaining to the research problem using measures of association and inferential statistics. The measures of association bring out the relationship between two or more sets of data. Using inferential statistics the computer can draw inferences, and conclusions on whether certain situations are by chance or are reflective of the true situation. However, please be reminded that the statistical tests that the computer can perform will depend on the data collection process. This is because; certain statistical tests only work with certain data collection techniques and the resultant types of data collected e.g. nominal, ordinal, and interval data e.t.c. The sample size and the population from which you collect the sample can also influence the type of statistical tests to be used. Basically, the computer can perform two types of tests to your data, viz parametric and non-parametric tests. Parametric tests are tests that are used to establish the correlation between continuous random variables. These tests require some assumptions of the population before application. As an example, one has to assume that the data is normally distributed. These tests only use interval and ratio data. Non-parametric tests make no assumptions about the population. These tests work with ordinal and nominal data. The various types of parametric and non-parametric tests have been already summarized in table 3. NB* Most students who are not well versed with these computer-based packages, have tended to outsource this service and hence have protected their ignorance of the use of these packages. This behaviour should be condemned. What is required is for you to be conversant with these programs through practice. In this case, pilot data can be used for such purposes or you can adopt a sitting by “Nelly” approach, which is equivalent to on the job training. The use of computer assisted data presentation and analysis should be wholly encouraged among tourism students. 8 2. THE PRESENTATION OF DATA Step 1: Write suitable introduction As usual, this introduction should capture the intention of the section.. Step 2: Conceptually come up with the sub-headings for this chapter The conceptual sub-headings you come up with, should be based on the research questions and objectives of the study. It is not advisable, to turn all the questions in the questionnaire into sub-headings for the mere fact that some of the actual questions in the questionnaire would only be assisting in logically and systematically unlocking the answers to the actual research questions. In addition, suppose this questionnaire had 50 questions, would it be logical to have 50 chapter 4 sub-headings? Certainly not! Step 3: Present the data for each of the sub-headings you formulated in item 2 In the presentation of your results, the following points are worth noting: The general layout of your results must be clear and pleasing to the eye. Even ordinary laymen should understand the results. Diagrams, graphs and charts should be appropriately used. Identify trends, extremes e.t.c and comment on them. Blend the trends, extremes e.t.c with existing related literature, new or established theories. Draw general conclusions to each of your sub-headings to this chapter. These are the conclusions you will want to refer to in the last chapter of your dissertation. Relate your results to their effects on professional practice. Avoid including general points, which, although important, will not be arising from your research results. Where various methods of collecting data were employed, as is usually the case, the results from these methods should be blended and discussed under the appropriate sub-headings for the chapter. The use of subjective quantitative terms such as ‘most,’ ‘some,’ ‘several’, ‘a large number,’ etc are wholly discouraged. As much as possible, these terms should be replaced by exact figures, percentages e.t.c as evidence that the variables in question were measured. The general methods of representing the data Use of tables – a table is a systematic arrangement of data or information in a format that allows the reader to observe variations or trends and to make comparisons (Blaxter and Hughes 1996) Uses of figures – bar charts, scatter graphs, pie charts, line graphs and maps. In this case, you should examine the data carefully and decide what should be illustrated 9 and which type of illustration best suits the data or the information you wish to present. Descriptions/narratives The various specific methods of representing data and when to use them Tables are good for presenting precise numerical data. In this case, you should use a table where the exact value is important or where there is need for strict testing of results using statistical methods Figures and photos give vivid evidence of research findings. They present information in a way that is easy to read and understand quickly. They present information in a form which otherwise would need many words ‘a picture is worth a thousand words’. They must be simple and clear so that readers can immediately get the message. Do not use a graph to duplicate information already in a table. Use a line graph if you wish to demonstrate the relationship between two data sets or a dynamic comparison over time. Use scatter graphs to portray the relationships or the strength of association between two or more variables. In most cases, the variables will consist of a dependent and an independent variable. The relationship between these variables can also be positive, where an increase in the value of one also causes an increase in the value of the other. The opposite is true where the relationship is negative. Scatter graphs only give a rough estimate of the association. If you want to measure the strength of the an association in a more precise manner, you then have to calculate the correlation coefficients, which at this point, we would like to assume that you are already familiar with. Use a flow chart to explain a process or system. Use a pie chart to show components that make up the overall total Use histograms to show frequency distributions of observations for each class of variable Use simple bar graphs to show changes in the totals of the dependent variables caused by changes in the independent variables. The changes in the dependent variables will be shown by changes in the height of each bar. Component bar graphs are mostly used where changes in the total and an indication of the size of each component figure are required. There are two types of component bar charts viz; (i) Component bar graphs (actuals), in which the component length of the whole bar indicates actual figures. (ii) Percentage component bar graphs in which each component length of a bar indicates the percentage of the overall bar total . 10 Multiple bar graphs are mostly used where only changes in the actual values of the component figures are required and the overall total is of no particular importance (Kumar 1996). Fortunately, most of the methods that have been described in this section can be produced in a split of second by the computer as long as the data for the variables in question have been captured. Hints on the use of tables and figures in text Tables and figures should never precede their reference to them. This means that, you should not just plant figures or tables without first making reference to them. Tables and figures should always immediately follow their reference to them unless they are part of the appendix. Figures and tables should be used to aid understanding and bring out the main findings of the research. Do not present the same data in different ways for no apparent reason. Tables and figures should have titles derived from or related to the research problem and be numbered in Arabic numerals. Figures should be fully labeled. Tables or figures sourced from secondary data sources must be acknowledged. The figure or table title and the relevant figure or table should appear on the same page. Table or figure titles should preferably be typed in lower case, done in bold, aligned left and placed either below or above the table throughout the text. Tables should not have unnecessary column and row spaces and should not protrude beyond the left and right margins of the text. A table or figure, which cannot fit on the remaining space of a page, should be placed on the next page. Put text to cover the remaining page space. Only those tables and figures that represent relevant information to the research problem should be included in the text. All tables and figures should be neat, simple, accurate and well labeled. 3. CONCLUSION This paper has revealed the various ways of analysing and presenting both qualitative and quantitative data. The paper has revealed that there are numerous ways of analysing and presenting data depending on whether the data is qualitative and quantitative. 11 References Altinay L. and Paraskevas A. (2008) Planning Research in Hospitality and Tourism, Elsevier limited U.K. Brotherton B. (2008) Researching tourism and Hospitality: A Student Guide, Sage Publications, London. Clark J. et. al. (1998), Research and Writing Dissertations in Hospitality and Tourism, Paul Chapman Ltd, Oxford. Fin, M. (2002), Tourism & Leisure Research Method, Data Collection, analysis and interpretation, Pearson Longman, London. Veil, A.J. (1997), Research methods for leisure and tourism, A practical guide, Second edition, Prentice Hall, London. 12