Unit 4 Data Analysis-I TOPICS 4.1. Data Analysis 4.2. Hypothesis 4.3. Conjoint Analysis 4.4. Factor Analysis • Once the data has been collected, the researcher has to process, analyze and interpret the same. It is emphasized that the researcher should exercise good care to ensure that reliable data are collected. • All this effort however will go in vain if the collected data are not properly processed and analyzed. Sufficient attention is often not given to these aspects, with the result that the quality of the report suffers. • It is desirable to have a well thought out framework for the processing and analysis of data prior to their collection. • Dummy tables should be prepared in order to illustrate the nature and extent of tabulation as also the comparisons of data that will be undertaken. • At the same time, it may be noted that certain changes in such a framework may become necessary at a later stage. • The researcher should not hesitate to introduce such changes as may be necessary to improve the quality of tabulation and analysis of data. A) Meaning: Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domain. B) Definition: a) Johan Galtung: “Data analysis refers to seeing the data in the light of hypothesis or research questions and the prevailing theories and drawing conclusions that are as amenable to theory formation as possible.” C) Steps in Data Analysis: There are two broad categories of data analysis. These are Examination of Data as regards quality and reliability and Preparation of Analysis of Sheet. 1) Examination of Data: Here, the relevance, validity and practical utility of the data to be used will be studied. This is necessary as conclusions about the marketing problem under consideration are drawn from the data used. Such data need to be examined properly so that data which are reliable, accurate and relevant are used for interpretation purpose. In the process of such examination, attention should be given to the following aspects: a) Relevance of Data: The data selected for interpretation should be relevant to the research problem. Data collected and even tabulated but not useful i.e. irrelevant to the research problem under consideration should be kept away i.e. not selected for interpretation purpose. C) Steps in Data Analysis: 1) Examination of Data: b) Reliability of Data: Data to be used for interpretation should be reliable. This aspect needs to be given due consideration while examining the data available. c) Practicability (Practical use) of Data: Data to be used for interpretation should have practical value/utility. It should be directly related to the problem under investigation and should be also useful for drawing conclusions. 2) Preparation of Analysis Sheets/Tables: Analysis sheets are prepared after selecting data for interpretation purpose. Analysis sheets are tables with information in summary form. Such sheets are as per the questions asked (in the questionnaire) and the responses given by the respondents. A) Concept: A hypothesis is a proposition which the researcher wants to verify. It may be mentioned that while a hypothesis is useful, it is not always necessary. Many a time, the researcher is interested in collecting and analysing data, indicating the main characteristics without a hypothesis excepting the one which he may suggest incidentally during the course of his study. B) Null Hypothesis: A null hypothesis is a statement about a population parameter (such as y), and the test is used to decide whether or not to accept the hypothesis. A null hypothesis identified by the symbol H0, is always one of status quo or no difference. If the null hypothesis is false, something else must be true. Suppose that a person is facing a legal trial for committing a crime. The judge looks into all the evidence for and against it, listens very carefully the prosecution's and defendant‘s arguments, and then decides the case and gives his verdict. Now, the verdict could be 1) That the person has not committed the crime. 2) That the person has committed the crime. C) Procedure of Hypothesis Testing: Formulate a hypothesis Set up a Suitable Significance Level Select Test Criterion Make Decisions C) Procedure of Hypothesis Testing: 1) Formulate a hypothesis: The conventional approach to hypothesis testing is to set up two hypotheses instead of one in such a way that if one hypothesis is true the other is false. These two hypotheses are: i) Null hypothesis, and ii) Alternative hypothesis 2) Set up a Suitable Significance Level: Having formulated the hypothesis, the next step is to test its validity at a certain level of significance. The confidence with which a null hypothesis is rejected or accepted depends upon the significance level used for the purpose. 3) Select Test Criterion: The next step in hypothesis testing is the selection of an appropriate statistical technique as a test criterion. There are many techniques from which one is to be chosen. 4) Make Decisions: The last step in hypothesis testing is to draw a statistical decision, involving the acceptance or rejection of the null hypothesis. D) Types of Errors in Hypothesis Testing: At this stage, it is worthwhile to know that when a hypothesis is tested, there are four possibilities: 1) The hypothesis is true but our test leads to its rejection. 2) The hypothesis is false but our test leads to its acceptance. 3) The hypothesis is true and our test leads to its acceptance. 4) The hypothesis is false and our test leads to its rejection. Of these four possibilities, the first two lead to erroneous decisions. The first possibility leads to a Type I error and the second possibility leads to a Type II error. This can be shown as follows: D) Types of Errors in Hypothesis Testing: The table above indicates that one of the two conditions (states of nature) exists in the population, i.e., either the null hypothesis is true or it is false. Similarly, there are two decision alternatives: Accept the null hypothesis or reject the null hypothesis. Thus, two decisions and two states of nature result into four possibilities In any hypothesis testing the researcher runs the risk of committing Type I and Type II errors. In case the researchers are interested in reducing the risk of committing a Type I error, then the size of the rejection region or level of significance should be reduced as indicated in Table above by α. When α= 0.10, it means that a true hypothesis will be accepted in 90 out of every 100 occasions. Thus, there is a risk of rejecting a true hypothesis in 10 out of every 100 occasions. To reduce this risk, α= 0.01 can be chosen, which implies that a 1 per cent risk can be taken. That is, the probability of rejecting a true hypothesis is merely 1 per cent instead of 10 per cent as in the previous case. E) Parametric Tests: The parametric tests assume that parameters such as mean, standard deviation, etc., exist and are used in testing a hypothesis. The parametric tests that are commonly used are Z-test, F-test, and t-Test. These tests are more powerful than the non-parametric tests. 1)Test of a Sample Mean, One Mean: n ≥ 30. While discussing statistical estimation the normal distribution was used. In hypothesis testing too, the standard normal distribution is used. This is the normal distribution which has been adjusted in a certain manner. It may be recalled that in case of a normal distribution of sample means with mean µ and standard deviation , if the mean of the population is subtracted, µ and then the mean of the resultant distribution will be zero. Further if is divided by then the resultant distribution will have a mean zero and a standard deviation 1. This transformed normal distribution i.e. is known as the standard normal distribution. It is this distribution which is used for testing the hypothesis. A few examples using the standard normal distribution i.e., are as follows: An example of one tail test can be considered. A characteristic of this test is that the alternative hypothesis is one-sided. For example, if H0:µ=50, then H1: µ>50. It can as well be H1:µ<50. When H1: µ>50 it is right tail test and when H1:µ<50 it is left tail test. Whether a test is to be right or left tail will depend upon the problem on hand. E) Parametric Tests: 2) Test of Differences between Two Proportions and Two Means: Often, marketing researchers are interested in knowing whether or not there exist the significant differences between the proportions of two groups say, consumers in respect of a certain activity. For example, they may like to know if male and female consumers show distinctive differences in their consumption of a particular product. An example of this type can be applied to other groups such as rural and urban consumers, educated and uneducated consumers, and so forth. While the procedure for testing hypothesis in such cases is the same as is used in the case of differences between means, there is one difference that the standard error of the difference between two proportions is used in place of the standard error of the difference between two means. The following formula is used for this purpose; is the estimated standard error of the differences between the two proportions. = proportion in sample 1 = proportion in sample 2 = size of sample 1 = size of sample 2 F) Non-Parametric Test: There are certain situations, particularly in marketing research studies where the assumption underlying the parametric tests is not valid. In other words, there is no assumption that a particular distribution is applicable, or that a certain value is attached to a parameter of the population. In such cases, instead of parametric tests, non-parametric tests are used. . These tests are also known as distribution-free tests. 1) Chi-Square One sample test: At times the researcher is interested in determining whether the number of observations or responses that fall into various categories differs from chance. A) Conditions for use of the Chi-Square Goodness of Fit Test The chi-square goodness of fit test is appropriate when the following conditions are met: i) The sampling method is the variable is at least 5. simple random sampling. ii) The population is at least 10 times as large as the sample. iii) The variable under study is categorical. iv)The expected value of the number of sample observations in each level of F) Non-Parametric Test: 1) Chi-Square One sample test: B) Steps to calculate the chi-square test: 1) State the null hypothesis and calculate the number in each category if the null hypothesis were correct. 2) Determine the level of significance, that is, how much risk of the type I error the researcher is prepared to take. 3) Calculate as follows: Where, Oi=observed frequency in the ithcategory Ei= expected number in the ithcategory k=number of categories. 4)Determine the number of degrees of freedom. For the specified level of significance and the degrees of freedom, find the critical or theoretical value of . 5)Compare the calculated value of with the theoretical value and determine the region of rejection. 2) Chi-Square test of Independence: In examining the relationship between two or more variables, the first step is to set up a frequency table which, in such cases is called a contingency table. An example of such a table can be showed with two variables; the income level and preference for shopping ce nters of 500 households. It has two rows and two columns. F) Non-Parametric Test: 2) Chi-Square test of Independence: • In examining the relationship between two or more variables, the first step is to set up a frequency table which, in such cases is called a contingency table. • An example of such a table can be showed with two variables; the income level and preference for shopping centers of 500 households. It has two rows and two columns. Each cell of a contingency table shows a certain relationship or interaction between the two variables. • In general, a contingency table is of r x c size, where r indicates the number of rows and c indicates the number of columnsthe chi-square test can be used as a test of goodness of fit, where the population and sample were classified on the basis of a single attribute. • It may be noted that the chi-square test need not be confined to a multinomial population but can be applied to other continuous distributions such as the normal distribution. Here, the use of chi-square as a test of independence will be of only concern. With the help of this technique, we can test whether or not two or more attributes are associated. G) ANOVA (Analysis of Variance): Analysis of variance is abbreviated as ANOVA. It is an important technique for the researchers conducting researches in the field of economics, psychology, sociology, biology, education, business or industry and so on. When there is multiple sample cases involved in the research, this technique is extremely useful. The basic principles of ANOVA: To test the significance of differences among the means of population by examining the amount of variation within each of these samples is the basic principle of ANOVA. It is assumed that each of the samples is drawn from a normal population and that each of these populations have the same variance. In the same way, it is assumed that all factors other than the one or more being tested are effectively controlled. It can be said that we assume the absence of many factors that might affect our conclusions concerning the factors to be studied. Thus, in ANOVA we have to make two estimates of population. They are as follows: 1) The amount of variation within each of the samples; and 2) The amount of variation between samples. Then these two estimates of population variance are compared with F-test, wherein we work out. F =Estimate of population variance based on between samples variance G) ANOVA (Analysis of Variance): The basic principles of ANOVA: Estimate of population variance based on within samples variance This value of F is to be compared to the F - limit for given degrees of freedom. If the F value is equal or exceeds the F - limit value, we may say that there are significant differences between the sample means. a)One Way Technique of ANOVA: The following steps are involved in this technique: 1) Obtain the mean of each sample i.e. obtain. x1, x2, x3, ..., xk when there are k samples. 2) Work out the mean of the sample means as follows : x = x1 + x2 + x3 + ... + xk where k= No. of samples 3) Square these deviations which may be multiplied by the number of items in the corresponding sample and then obtain their total. This is known as the sum of the squares for variance between the samples (i.e. ss between). Symbolically, this can be written as follows : SS between = n1(x1 x)2 n2( x2 x)2 ...... nk ( xk x)2 G) ANOVA (Analysis of Variance): a)One Way Technique of ANOVA: The following steps are involved in this technique: 4) Divide the result of the 3rd step by the degrees of freedom between the samples to obtain variance or mean square (MS) between samples. Symbolically, this can be written as follows : MS between = SS between 0 (k - 1) Where (k-1) represents degrees of freedom (d.f.) between samples. 5) Calculate the squares of such deviations and then obtain their total. This total is known as the sum of squares for variance within samples (or SS within). This can be written symbolically as follows: SS within = ^ 2 2 n1(x1 x) n2(x2 x) ...... nk a i = 2 1, 2, 3,....... (xk x) 6) Divide the result of 5th step by the degrees of freedom within samples to obtain the variance or mean square (MS) within samples. This can be symbolically written as follows: MS within = SS within 0 (n - k) Where (n - k) represents degrees of freedom within samples, n = total number of items in all the samples i.e. n1 + n2 +.......+nk k = number of samples. G) ANOVA (Analysis of Variance): a)One Way Technique of ANOVA: The following steps are involved in this technique: 7) For a check, the sum of squares of deviations for total variance can also be worked out by adding the squares of deviations when the deviations for the individual items in all samples have been taken from the mean of the sample means. This can be symbolically written as follows: ^ SS for total variance = a( xij x)2 j = 1, 2, 3,..... i = 1, 2, 3,….. This total should be equal to the total of the result of the 3rd and 5th steps explained above i.e. SS for total variance = SS between + SS within The degrees of freedom for total variance will be equal to the number of items in all samples minus one i.e. (n-1). The degrees of freedom for between and within must add up to the degrees of freedom for total variance i.e. (n-1) = (k-1) + (n-k) This fact explains the additive property of the ANOVA technique. 8) Finally, calculate the F-ratio as follows: F- ratio = MS between 0 MS within ij - G) ANOVA (Analysis of Variance): b) Two Way Technique of ANOVA: In research, sometimes data are classified on the basis of two factors. In such a case two-way ANOVA technique is very useful. In a firm, data can be classified on the basis of salesman and also on the basis of sales in different regions. The various steps in two-way ANOVA technique can be given as follows: 1) Use the coding device if the same simplifies the task. 2) Take the total of the values of the individual items in all the samples and call it T. 3) Work out the correction factor as under: Correction factor = (T )2 4) Take the square of all items one by one and then add them. Take its total. Subtract the correction factor from this total to obtain the sum of squares of deviations for total variance. We can write it symbolically as follows: Sum of squares of deviations for total variance or total SS ij - G) ANOVA (Analysis of Variance): b) Two Way Technique of ANOVA: The various steps in two-way ANOVA technique can be given as follows: 5) Obtain the square of each column total. Divide such value by the number of items in the concerning column and then take total of the result thus obtained. Finally, subtract the correction factor from this total to obtain the sum of squares of deviations for variance between columns or (SS between columns). It can be written as follows: 6) Next, obtain the sum of squares of deviation for variance between row (or SS between rows). It is given by, ij - G) ANOVA (Analysis of Variance): b) Two Way Technique of ANOVA: The various steps in two-way ANOVA technique can be given as follows: 7) Sum of squares of deviations for residual or error variance can be worked out by subtracting the result of the sum of 5th and 6th steps from the result of 4th step stated above. Thus, total SS - (SS between column+SS between rows) = SS for residual or error variance. 8) Degrees of freedom (d.f) can be calculated as follows: d.f. for total variance = (c.r - 1) d.f. for variance between columns = (c-1) d.f. for variance between row = (r-1) d.f. for residual variance = (c-1) (r-1) where c = number of columns. r = number of rows. ij - A) Meaning: Conjoint analysis is a set of statistical tools used by market researchers to assess the value consumers place on products and their specific features or attributes. The ultimate goal of conjoint analysis is to quantify each product attribute (and the various attribute options) to assist in the development of better products and a more sound pricing strategy. B) Conceptual Basis: While specific research objectives will dictate the direction of conjoint research, there are several components that are common to all conjoint engagements. These steps include: selection of attributes; specification of attribute levels; specific combinations of attributes; selection of stimuli; aggregation of judgments; selection of analysis technique. ij - B) Conceptual Basis: Selection of Attributes Selection of Analysis Technique Specification Levels of Attributes Aggregation of Judgments Specific Combinations of Attributes Selection of Form of Stimuli ij - B) Conceptual Basis: Step 1: Selection of Attributes: The first step in conjoint analysis involves the identification of the relevant product or service attributes. In order to identify product attributes several approaches are available to the researcher. He may interview a number of consumers directly. Alternatively, he may conduct focus group interviews with consumers. Yet another option available to the researcher is to contact the product managers and retailers who are well-informed in that particular field Step 2: Specification Levels of Attributes: Having identified the attributes the next step is to specify the actual levels of each attribute, Here the researcher should be aware of the relationship between the number of levels used to measure an attribute and preference of the respondent for that attribute. In case a large number of levels of attributes are chosen. It will put a great burden on the respondents. Step 3: Specific Combinations of Attributes: The next step in the process of conjoint analysis involves the specific combinations of attributes that will be used. The number of possible combinations is given by the product of number of at- tributes and the number of levels. ij - B) Conceptual Basis: Step 4: Selection of Form of Stimuli: Coming to data collection procedure the ‘trade off’ approach or the full profile approach may be used. The first approach involves the consideration of only two attributes at a time by the respondents. They are asked to rank each combination of levels of attributes from the most preferred to the least preferred Respondents are directly given cards with an example how to complete them. Step 5: Aggregation of Judgments: This step in conjoint analysis process involves deciding how the responses from individual consumers should be aggregated. Conjoint studies produce part-worths utility for each respondent for each level of each attribute. However these should not be averaged across individuals to determine the average utility for each level of each attribute. Step 6: Selection of Analysis Technique: This is the final step concerned with the analysis of input data. Here the question is: which technique should be used for analysis? Although a variety of approaches are available for analyzing conjoint data regression analysis is very frequently used. ij - A) Meaning: Factor analysis identifies unobserved variables that explain patterns of correlations within a set of observed variables. It is often used to identify a small number of factors that explain most of the variance embedded in a larger number of variables. Thus, factor analysis is about data reduction. B) Types of Factor analysis: Confirmatory factor analysis (CFA) Exploratory factor analysis (EFA) Structural Modeling Equation Types of Factor analysis ij - B) Type of Factor Analysis: 1) Exploratory factor analysis (EFA): Exploratory factor analysis (EFA) is used to uncover the underlying structure of a relatively large set of variables. The researcher’s a priori assumption is that any indicator may be associated with any factor. 2) Confirmatory factor analysis (CFA): Confirmatory factor analysis (CFA) seeks to determine if the number of factors and the loadings of measured (indicator) variables on them conform to what is expected on the basis of pre-established theory. 3)Structural Modeling Equation: Structural equation modeling hypothesizes a relationship between a set of variables and factors and tests these casual relationships on the linear equation model. Structural equation modeling can be used for exploratory and confirmatory modeling alike, and hence it can be used for confirming results as well as testing hypotheses. ij - C) Interpreting a Factor Matrix: The task of interpreting a factor loading matrix to identify the structure among the variables can at first seem overwhelming. The researcher must sort through all the factor loadings (remember. each variable has a loading on each factor) to identify those most indicative of the underlying structure. following the five-step procedure outlined next, the process can be simplified considerably. Identify the Significant Loading(s) for Each Examine the Variable Factor Matrix of Loading Assess the Communalities of the Variables Respectify the Factor Model lf Needed Label the Factors ij - C) Interpreting a Factor Matrix: Step 1: Examine the Factor Matrix of Loading: The factor loading matrix contains the factor loading of each variable on each factor. They may be either rotated or unrotated loadings. But, rotated loadings are usually used in factor interpretation unless data reduction is the sole objective. Step 2: Identify the Significant Loading(s) for Each Variable: The interpretation should start with the first variable on the first factor and move horizontally from left to right looking for the highest loading for that variable on any factor. When the highest loading (largest absolute factor loading) is identified, it should be underlined if significant as determined by the criteria discussed earlier. Step 3: Assess the Communalities of the Variables: Once all the significant loadings have been identified, the researcher should look for any variables that are not adequately accounted for by the factor solution. One simple approach is to identify any variable(s) lacking at least one significant loading. Another approach is to examine each variable’s communality representing the amount of variance accounted for by the factor solution for each variable. ij - C) Interpreting a Factor Matrix: Step 4: Respectively the Factor Model lf Needed: Once all the significant loadings have been identified and the communalities examined the researcher may find any one of several problems: a)A variable has no significant loadings. b)Even with a significant loading, a variable‘s communality is deemed too low, or c)A variable has a cross-loading. Step 5: Label the Factors: When an acceptable factor solution has been obtained in which all variables have a significant loading on a factor, the researcher attempts to assign some meaning to the pattern of factor loadings Variables with higher loadings are considered more important and have greater influence on the name or label selected to represent a factor. ij - D) Criteria for the Number of Factors to Extract: Factor analysis methods are interested in the best linear combination of variables, best in the sense that the particular combination of original variables accounts for more of the variance in the data as a whole than any other linear combination of variables Therefore, the first factor may be viewed as the single best summary of linear relationships exhibited in the data. The second factor is defined as the second best linear combination of the variables subject to the constraint that it is orthogonal to the first factor However, the following stopping criteria for the number of factors to extract are currently being utilized. 1) Latent Root Criteria: The most commonly used technique is the latent root criterion. This technique is simple to apply to either components analysis or common factor analysis. The rationale for the latent root criterion is that any individual factor should account for the variance of at least a single variable if it is to be retained for interpretation. 2) Screen Test Criteria: Recall that with the component analysis factor model, the later factors extracted contain both common and unique variance. Although all factors contain at least some unique variance, the proportion of unique variance is substantially higher in later factors. ij - D) Criteria for the Number of Factors to Extract: Fig: Eigen value plot for screen test criterion