Uploaded by Rahul Prahladrao More FDP 2018

305 MKT UNIT 4 PPT

advertisement
Unit 4
Data Analysis-I
TOPICS
4.1. Data Analysis
4.2. Hypothesis
4.3. Conjoint Analysis
4.4. Factor Analysis
• Once the data has been collected, the researcher has to process, analyze
and interpret the same. It is emphasized that the researcher should
exercise good care to ensure that reliable data are collected.
• All this effort however will go in vain if the collected data are not properly
processed and analyzed. Sufficient attention is often not given to these
aspects, with the result that the quality of the report suffers.
• It is desirable to have a well thought out framework for the processing and
analysis of data prior to their collection.
• Dummy tables should be prepared in order to illustrate the nature and
extent of tabulation as also the comparisons of data that will be
undertaken.
• At the same time, it may be noted that certain changes in such a framework
may become necessary at a later stage.
• The researcher should not hesitate to introduce such changes as may be
necessary to improve the quality of tabulation and analysis of data.
A) Meaning:
Analysis of data is a process of inspecting, cleaning, transforming, and
modeling data with the goal of discovering useful information, suggesting
conclusions, and supporting decision making. Data analysis has multiple facets
and approaches, encompassing diverse techniques under a variety of names,
in different business, science, and social science domain.
B) Definition:
a) Johan Galtung:
“Data analysis refers to seeing the data in the light of hypothesis or
research questions and the prevailing theories and drawing conclusions
that are as amenable to theory formation as possible.”
C) Steps in Data Analysis:
There are two broad categories of data analysis. These are Examination of
Data as regards quality and reliability and Preparation of Analysis of Sheet.
1) Examination of Data:
Here, the relevance, validity and practical utility of the data to be used will
be studied. This is necessary as conclusions about the marketing problem
under consideration are drawn from the data used. Such data need to be
examined properly so that data which are reliable, accurate and relevant
are used for interpretation purpose.
In the process of such examination, attention should be given to the
following aspects:
a) Relevance of Data:
The data selected for interpretation should be relevant to the research
problem. Data collected and even tabulated but not useful i.e. irrelevant
to the research problem under consideration should be kept away i.e.
not selected for interpretation purpose.
C) Steps in Data Analysis:
1) Examination of Data:
b) Reliability of Data:
Data to be used for interpretation should be reliable. This aspect needs
to be given due consideration while examining the data available.
c) Practicability (Practical use) of Data:
Data to be used for interpretation should have practical value/utility. It
should be directly related to the problem under investigation and should
be also useful for drawing conclusions.
2) Preparation of Analysis Sheets/Tables:
Analysis sheets are prepared after selecting data for interpretation purpose.
Analysis sheets are tables with information in summary form. Such sheets
are as per the questions asked (in the questionnaire) and the responses
given by the respondents.
A) Concept:
A hypothesis is a proposition which the researcher wants to verify. It may be
mentioned that while a hypothesis is useful, it is not always necessary. Many a time,
the researcher is interested in collecting and analysing data, indicating the main
characteristics without a hypothesis excepting the one which he may suggest
incidentally during the course of his study.
B) Null Hypothesis:
A null hypothesis is a statement about a population parameter (such as y), and the
test is used to decide whether or not to accept the hypothesis. A null hypothesis
identified by the symbol H0, is always one of status quo or no difference. If the null
hypothesis is false, something else must be true.
Suppose that a person is facing a legal trial for committing a crime. The judge looks
into all the evidence for and against it, listens very carefully the prosecution's and
defendant‘s arguments, and then decides the case and gives his verdict. Now, the
verdict could be
1) That the person has not committed the crime.
2) That the person has committed the crime.
C) Procedure of Hypothesis Testing:
Formulate a
hypothesis
Set up a
Suitable
Significance
Level
Select Test
Criterion
Make
Decisions
C) Procedure of Hypothesis Testing:
1) Formulate a hypothesis:
The conventional approach to hypothesis testing is to set up two hypotheses instead
of one in such a way that if one hypothesis is true the other is false. These two
hypotheses are:
i) Null hypothesis, and
ii) Alternative hypothesis
2) Set up a Suitable Significance Level:
Having formulated the hypothesis, the next step is to test its validity at a certain level
of significance. The confidence with which a null hypothesis is rejected or accepted
depends upon the significance level used for the purpose.
3) Select Test Criterion:
The next step in hypothesis testing is the selection of an appropriate statistical
technique as a test criterion. There are many techniques from which one is to be
chosen.
4) Make Decisions:
The last step in hypothesis testing is to draw a statistical decision, involving the
acceptance or rejection of the null hypothesis.
D) Types of Errors in Hypothesis Testing:
At this stage, it is worthwhile to know that when a hypothesis is tested, there are four
possibilities:
1) The hypothesis is true but our test leads to its rejection.
2) The hypothesis is false but our test leads to its acceptance.
3) The hypothesis is true and our test leads to its acceptance.
4) The hypothesis is false and our test leads to its rejection.
Of these four possibilities, the first two lead to erroneous decisions. The first
possibility leads to a Type I error and the second possibility leads to a Type II error.
This can be shown as follows:
D) Types of Errors in Hypothesis Testing:
The table above indicates that one of the two conditions (states of nature) exists in
the population, i.e., either the null hypothesis is true or it is false. Similarly, there are
two decision alternatives:
Accept the null hypothesis or reject the null hypothesis.
Thus, two decisions and two states of nature result into four possibilities
In any hypothesis testing the researcher runs the risk of committing Type I and Type II
errors. In case the researchers are interested in reducing the risk of committing a Type
I error, then the size of the rejection region or level of significance should be reduced
as indicated in Table above by α.
When α= 0.10, it means that a true hypothesis will be accepted in 90 out of every 100
occasions. Thus, there is a risk of rejecting a true hypothesis in 10 out of every 100
occasions. To reduce this risk, α= 0.01 can be chosen, which implies that a 1 per cent
risk can be taken. That is, the probability of rejecting a true hypothesis is merely 1 per
cent instead of 10 per cent as in the previous case.
E) Parametric Tests:
The parametric tests assume that parameters such as mean, standard deviation, etc., exist
and are used in testing a hypothesis. The parametric tests that are commonly used are Z-test,
F-test, and t-Test. These tests are more powerful than the non-parametric tests.
1)Test of a Sample Mean, One Mean: n ≥ 30.
While discussing statistical estimation the normal distribution was used. In hypothesis
testing too, the standard normal distribution is used. This is the normal distribution which has
been adjusted in a certain manner.
It may be recalled that in case of a normal distribution of sample means with mean µ
and standard deviation , if the mean of the population is subtracted, µ and then the mean of
the resultant distribution will be zero. Further if is divided by then the resultant distribution
will have a mean zero and a standard deviation 1. This transformed normal distribution i.e. is
known as the standard normal distribution. It is this distribution which is used for testing the
hypothesis.
A few examples using the standard normal distribution i.e., are as follows:
An example of one tail test can be considered. A characteristic of this test is that the
alternative hypothesis is one-sided. For example, if H0:µ=50, then H1: µ>50. It can as well be
H1:µ<50. When H1: µ>50 it is right tail test and when H1:µ<50 it is left tail test. Whether a test
is to be right or left tail will depend upon the problem on hand.
E) Parametric Tests:
2) Test of Differences between Two Proportions and Two Means:
Often, marketing researchers are interested in knowing whether or not there exist the
significant differences between the proportions of two groups say, consumers in respect of a
certain activity. For example, they may like to know if male and female consumers show
distinctive differences in their consumption of a particular product. An example of this type
can be applied to other groups such as rural and urban consumers, educated and uneducated
consumers, and so forth. While the procedure for testing hypothesis in such cases is the same
as is used in the case of differences between means, there is one difference that the standard
error of the difference between two proportions is used in place of the standard error of the
difference between two means.
The following formula is used for this purpose;
is the estimated standard error of the differences between the two proportions.
= proportion in sample 1
= proportion in sample 2
= size of sample 1
= size of sample 2
F) Non-Parametric Test:
There are certain situations, particularly in marketing research studies where the assumption
underlying the parametric tests is not valid. In other words, there is no assumption that a
particular distribution is applicable, or that a certain value is attached to a parameter of the
population. In such cases, instead of parametric tests, non-parametric tests are used. . These
tests are also known as distribution-free tests.
1) Chi-Square One sample test:
At times the researcher is interested in determining whether the number of observations or
responses that fall into various categories differs from chance.
A) Conditions for use of the Chi-Square Goodness of Fit Test
The chi-square goodness of fit test is appropriate when the following conditions are met:
i) The sampling method is the variable is at least 5. simple random sampling.
ii) The population is at least 10 times as large as the sample.
iii) The variable under study is categorical.
iv)The expected value of the number of sample observations in each level of
F) Non-Parametric Test:
1) Chi-Square One sample test:
B) Steps to calculate the chi-square test:
1) State the null hypothesis and calculate the number in each category if the null hypothesis
were correct.
2) Determine the level of significance, that is, how much risk of the type I error the researcher
is prepared to take.
3) Calculate as follows:
Where, Oi=observed frequency in the ithcategory
Ei= expected number in the ithcategory
k=number of categories.
4)Determine the number of degrees of freedom. For the specified level of significance and
the degrees of freedom, find the critical or theoretical value of .
5)Compare the calculated value of with the theoretical value and determine the region of
rejection.
2) Chi-Square test of Independence:
In examining the relationship between two or more variables, the first step is to set up a
frequency table which, in such cases is called a contingency table. An example of such a table
can be showed with two variables; the income level and preference for shopping ce nters of
500 households. It has two rows and two columns.
F) Non-Parametric Test:
2) Chi-Square test of Independence:
• In examining the relationship between two or more variables, the first step is to set up a
frequency table which, in such cases is called a contingency table.
• An example of such a table can be showed with two variables; the income level and
preference for shopping centers of 500 households. It has two rows and two columns. Each
cell of a contingency table shows a certain relationship or interaction between the two
variables.
• In general, a contingency table is of r x c size, where r indicates the number of rows and c
indicates the number of columnsthe chi-square test can be used as a test of goodness of fit,
where the population and sample were classified on the basis of a single attribute.
• It may be noted that the chi-square test need not be confined to a multinomial population
but can be applied to other continuous distributions such as the normal distribution. Here,
the use of chi-square as a test of independence will be of only concern. With the help of
this technique, we can test whether or not two or more attributes are associated.
G) ANOVA (Analysis of Variance):
Analysis of variance is abbreviated as ANOVA. It is an important technique for the researchers
conducting researches in the field of economics, psychology, sociology, biology, education,
business or industry and so on. When there is multiple sample cases involved in the research,
this technique is extremely useful.
The basic principles of ANOVA:
To test the significance of differences among the means of population by examining the
amount of variation within each of these samples is the basic principle of ANOVA. It is
assumed that each of the samples is drawn from a normal population and that each of these
populations have the same variance. In the same way, it is assumed that all factors other
than the one or more being tested are effectively controlled. It can be said that we assume
the absence of many factors that might affect our conclusions concerning the factors to be
studied.
Thus, in ANOVA we have to make two estimates of population. They are as follows:
1) The amount of variation within each of the samples; and
2) The amount of variation between samples.
Then these two estimates of population variance are compared with F-test, wherein we work
out. F =Estimate of population variance based on between samples variance
G) ANOVA (Analysis of Variance):
The basic principles of ANOVA:
Estimate of population variance based on within samples variance
This value of F is to be compared to the F - limit for given degrees of freedom. If the F value is
equal or exceeds the F - limit value, we may say that there are significant differences between
the sample means.
a)One Way Technique of ANOVA:
The following steps are involved in this technique:
1) Obtain the mean of each sample i.e. obtain. x1, x2, x3, ..., xk
when there are k samples.
2) Work out the mean of the sample means as follows :
x = x1 + x2 + x3 + ... + xk
where k= No. of samples
3) Square these deviations which may be multiplied by the number of items in the
corresponding sample and then obtain their total. This is known as the sum of the squares for
variance between the samples (i.e. ss between). Symbolically, this can be written as follows :
SS between = n1(x1  x)2  n2( x2  x)2  ......  nk ( xk
 x)2
G) ANOVA (Analysis of Variance):
a)One Way Technique of ANOVA:
The following steps are involved in this technique:
4) Divide the result of the 3rd step by the degrees of freedom between the samples to obtain
variance or mean square (MS) between samples. Symbolically, this can be written as follows :
MS between = SS between 0
(k - 1)
Where (k-1) represents degrees of freedom (d.f.) between samples.
5) Calculate the squares of such deviations and then obtain their total. This total is known as the
sum of squares for variance within samples (or SS within). This can be written symbolically as
follows:
SS within =
^
2
2
n1(x1
x)

n2(x2

x)

......

nk
a
i = 2 1, 2, 3,.......
(xk

x)
6) Divide the result of 5th step by the degrees of freedom within samples to obtain the variance
or mean square (MS) within samples. This can be symbolically written as follows:
MS within = SS within 0
(n - k)
Where (n - k) represents degrees of freedom within samples,
n = total number of items in all the samples i.e. n1 + n2 +.......+nk
k = number of samples.
G) ANOVA (Analysis of Variance):
a)One Way Technique of ANOVA:
The following steps are involved in this technique:
7) For a check, the sum of squares of deviations for total variance can also be worked out by
adding the squares of deviations when the deviations for the individual items in all samples
have been taken from the mean of the sample means. This can be symbolically written as
follows:
^
SS for total variance = a( xij 
x)2 j = 1, 2, 3,.....
i = 1, 2, 3,…..
This total should be equal to the total of the result of the 3rd and 5th steps explained above
i.e. SS for total variance = SS between + SS within
The degrees of freedom for total variance will be equal to the number of items in all samples
minus one i.e. (n-1). The degrees of freedom for between and within must add up to the
degrees of freedom for total variance i.e.
(n-1) = (k-1) + (n-k)
This fact explains the additive property of the ANOVA technique.
8) Finally, calculate the F-ratio as follows:
F- ratio = MS between 0
MS within
ij -
G) ANOVA (Analysis of Variance):
b) Two Way Technique of ANOVA:
In research, sometimes data are classified on the basis of two factors. In such a case two-way
ANOVA technique is very useful. In a firm, data can be classified on the basis of salesman and
also on the basis of sales in different regions.
The various steps in two-way ANOVA technique can be given as follows:
1) Use the coding device if the same simplifies the task.
2) Take the total of the values of the individual items in all the samples and call it T.
3) Work out the correction factor as under:
Correction factor = (T )2
4) Take the square of all items one by one and then add them. Take its total. Subtract the
correction factor from this total to obtain the sum of squares of deviations for total variance.
We can write it symbolically as follows:
Sum of squares of deviations for total variance or total SS
ij -
G) ANOVA (Analysis of Variance):
b) Two Way Technique of ANOVA:
The various steps in two-way ANOVA technique can be given as follows:
5) Obtain the square of each column total. Divide such value by the number of items in the
concerning column and then take total of the result thus obtained. Finally, subtract the
correction factor from this total to obtain the sum of squares of deviations for variance
between columns or (SS between columns).
It can be written as follows:
6) Next, obtain the sum of squares of deviation for variance between row (or SS between rows).
It is given by,
ij -
G) ANOVA (Analysis of Variance):
b) Two Way Technique of ANOVA:
The various steps in two-way ANOVA technique can be given as follows:
7) Sum of squares of deviations for residual or error variance can be worked out by subtracting
the result of the sum of 5th and 6th steps from the result of 4th step stated above.
Thus, total SS - (SS between column+SS between rows)
= SS for residual or error variance.
8) Degrees of freedom (d.f) can be calculated as follows:
d.f. for total variance = (c.r - 1)
d.f. for variance between columns = (c-1)
d.f. for variance between row = (r-1)
d.f. for residual variance = (c-1) (r-1)
where c = number of columns.
r = number of rows.
ij -
A) Meaning:
Conjoint analysis is a set of statistical tools used by market researchers to assess the value
consumers place on products and their specific features or attributes. The ultimate goal of
conjoint analysis is to quantify each product attribute (and the various attribute options) to
assist in the development of better products and a more sound pricing strategy.
B) Conceptual Basis:
While specific research objectives will dictate the direction of conjoint research, there are
several components that are common to all conjoint engagements. These steps include:
selection of attributes; specification of attribute levels; specific combinations of attributes;
selection of stimuli; aggregation of judgments; selection of analysis technique.
ij -
B) Conceptual Basis:
Selection of
Attributes
Selection of
Analysis
Technique
Specification
Levels of
Attributes
Aggregation of
Judgments
Specific
Combinations of
Attributes
Selection of Form
of Stimuli
ij -
B) Conceptual Basis:
Step 1: Selection of Attributes:
The first step in conjoint analysis involves the identification of the relevant product or
service attributes. In order to identify product attributes several approaches are
available to the researcher. He may interview a number of consumers directly.
Alternatively, he may conduct focus group interviews with consumers. Yet another
option available to the researcher is to contact the product managers and retailers who
are well-informed in that particular field
Step 2: Specification Levels of Attributes:
Having identified the attributes the next step is to specify the actual levels of each
attribute, Here the researcher should be aware of the relationship between the number
of levels used to measure an attribute and preference of the respondent for that
attribute. In case a large number of levels of attributes are chosen. It will put a great
burden on the respondents.
Step 3: Specific Combinations of Attributes:
The next step in the process of conjoint analysis involves the specific combinations of
attributes that will be used. The number of possible combinations is given by the
product of number of at- tributes and the number of levels.
ij -
B) Conceptual Basis:
Step 4: Selection of Form of Stimuli:
Coming to data collection procedure the ‘trade off’ approach or the full profile approach
may be used. The first approach involves the consideration of only two attributes at a
time by the respondents. They are asked to rank each combination of levels of attributes
from the most preferred to the least preferred Respondents are directly given cards with
an example how to complete them.
Step 5: Aggregation of Judgments:
This step in conjoint analysis process involves deciding how the responses from
individual consumers should be aggregated. Conjoint studies produce part-worths utility
for each respondent for each level of each attribute. However these should not be
averaged across individuals to determine the average utility for each level of each
attribute.
Step 6: Selection of Analysis Technique:
This is the final step concerned with the analysis of input data. Here the question is:
which technique should be used for analysis? Although a variety of approaches are
available for analyzing conjoint data regression analysis is very frequently used.
ij -
A) Meaning:
Factor analysis identifies unobserved variables that explain patterns of correlations within a set
of observed variables. It is often used to identify a small number of factors that explain most of
the variance embedded in a larger number of variables. Thus, factor analysis is about data
reduction.
B) Types of Factor analysis:
Confirmatory
factor
analysis (CFA)
Exploratory
factor
analysis (EFA)
Structural
Modeling
Equation
Types of
Factor
analysis
ij -
B) Type of Factor Analysis:
1) Exploratory factor analysis (EFA):
Exploratory factor analysis (EFA) is used to uncover the underlying structure of a relatively
large set of variables. The researcher’s a priori assumption is that any indicator may be
associated with any factor.
2) Confirmatory factor analysis (CFA):
Confirmatory factor analysis (CFA) seeks to determine if the number of factors and the
loadings of measured (indicator) variables on them conform to what is expected on the basis
of pre-established theory.
3)Structural Modeling Equation:
Structural equation modeling hypothesizes a relationship between a set of variables and
factors and tests these casual relationships on the linear equation model. Structural equation
modeling can be used for exploratory and confirmatory modeling alike, and hence it can be
used for confirming results as well as testing hypotheses.
ij -
C) Interpreting a Factor Matrix:
The task of interpreting a factor loading matrix to identify the structure among the variables
can at first seem overwhelming. The researcher must sort through all the factor loadings
(remember. each variable has a loading on each factor) to identify those most indicative of
the underlying structure. following the five-step procedure outlined next, the process can be
simplified considerably.
Identify the
Significant
Loading(s)
for Each
Examine the Variable
Factor Matrix
of Loading
Assess the
Communalities
of the
Variables
Respectify the
Factor Model lf
Needed
Label the
Factors
ij -
C) Interpreting a Factor Matrix:
Step 1: Examine the Factor Matrix of Loading:
The factor loading matrix contains the factor loading of each variable on each factor.
They may be either rotated or unrotated loadings. But, rotated loadings are usually used
in factor interpretation unless data reduction is the sole objective.
Step 2: Identify the Significant Loading(s) for Each Variable:
The interpretation should start with the first variable on the first factor and move
horizontally from left to right looking for the highest loading for that variable on any
factor. When the highest loading (largest absolute factor loading) is identified, it should
be underlined if significant as determined by the criteria discussed earlier.
Step 3: Assess the Communalities of the Variables:
Once all the significant loadings have been identified, the researcher should look for any
variables that are not adequately accounted for by the factor solution. One simple
approach is to identify any variable(s) lacking at least one significant loading. Another
approach is to examine each variable’s communality representing the amount of
variance accounted for by the factor solution for each variable.
ij -
C) Interpreting a Factor Matrix:
Step 4: Respectively the Factor Model lf Needed:
Once all the significant loadings have been identified and the communalities examined
the researcher may find any one of several problems:
a)A variable has no significant loadings.
b)Even with a significant loading, a variable‘s communality is deemed too low, or
c)A variable has a cross-loading.
Step 5: Label the Factors:
When an acceptable factor solution has been obtained in which all variables have a
significant loading on a factor, the researcher attempts to assign some meaning to the
pattern of factor loadings Variables with higher loadings are considered more important
and have greater influence on the name or label selected to represent a factor.
ij -
D) Criteria for the Number of Factors to Extract:
Factor analysis methods are interested in the best linear combination of variables, best in the
sense that the particular combination of original variables accounts for more of the variance
in the data as a whole than any other linear combination of variables Therefore, the first
factor may be viewed as the single best summary of linear relationships exhibited in the data.
The second factor is defined as the second best linear combination of the variables subject to
the constraint that it is orthogonal to the first factor
However, the following stopping criteria for the number of factors to extract are currently
being utilized.
1) Latent Root Criteria:
The most commonly used technique is the latent root criterion. This technique is simple to
apply to either components analysis or common factor analysis. The rationale for the latent
root criterion is that any individual factor should account for the variance of at least a single
variable if it is to be retained for interpretation.
2) Screen Test Criteria:
Recall that with the component analysis factor model, the later factors extracted contain
both common and unique variance. Although all factors contain at least some unique
variance, the proportion of unique variance is substantially higher in later factors.
ij -
D) Criteria for the Number of Factors to Extract:
Fig: Eigen value plot for screen test criterion
Download