
Q1. A statistical survey is a scientific process of collection and analysis of numerical
data. Explain the stages of statistical survey. Describe the various methods for
collecting data in a statistical survey. (Meaning of statistical survey, Stages of statistical
survey (Listing and Explanation), Methods for collecting data) 2, 5, 3
Meaning of Statistical Survey
A Statistical Survey is a scientific process of collection and analysis of numerical data. Statistical
surveys are used to collect information about units in a population and it involves asking questions to
Surveys of human populations are common in government, health, social science and marketing
Stages of Statistical Survey
Statistical surveys involve two stages namely – Planning and Execution. Figure shows the two broad
stages of Statistical Survey.
1. Planning a Statistical Survey
The relevance and accuracy of data obtained in a survey depends upon the care taken in planning. A
properly planned investigation can lead to the best results with least cost and time. Figure gives the
explanation of steps involved in the planning stage.
2. Execution of statistical survey
Controlled methods should be adopted at every stage of carrying out the investigation to check the
accuracy, coverage, methods of measurements, analysis and interpretation. The collected data should
be edited, classified, tabulated and presented in the form of diagrams and graphs. The data should be
carefully and systematically analysed and interpreted.
Methods for collecting data
Collection of data is done by a suitable method as per the following:
1. Direct personal observation
2. Indirect oral interview
3. Information through agencies
4. Information through mailed questionnaires
5. Information through a schedule filled by investigators
a) Explain the approaches to define probability.
b. In a bolt factory machines A, B, C manufacture 25, 35 and 40 percent of the total
output. Of their total output 5, 4 and 2 percent are defective respectively. A bolt is
drawn at random and is found to be defective. What are the probabilities that it was
manufactured by machines A, B and C? (Applying Bayes theorem and calculating the
probabilities) 6
Approaches to define Probability
There are four approaches to define Probability. They are as follows:
1) Classical / mathematical / priori approach
2) Statistical / relative frequency / empirical / posteriori approach
3) Subjective approach
4) Axiomatic approach
1) Classical / mathematical / priori approach
Under this approach the probability of an event is known before conducting the experiment. In this
case, each of possible outcomes is associated with equal probability of occurrence and number of
outcomes favourable to the concerned event is known.
2) Statistical / relative frequency / empirical / posteriori approach
Under this approach the probability of an event is arrived at after conducting an experiment. If we
want to know the probability that a particular household in an area will have two earning members,
then we have to gather data on all households in that area and then arrive at the probability. Greater
the number of households surveyed, greater will be the accuracy in the probability, arrived.
3) Subjective approach
Under this approach the investigator or researcher assigns probability to the events either from his
experience or from past records. It is more suitable when the sample size is ten or less than ten. The
investigator has full knowledge about the characteristics of each and every individual. However, there
is a chance of personal bias being introduced in such probability.
4) Axiomatic approach
The aim of the axiomatic approach to entanglement measures is to find, classify and study all
functions that capture our intuitive notion of what it means to measure entanglement. The approach
sets out axioms, i.e. properties, that an entanglement measure should or should not satisfy. This
intuitive notion may be based on more practical grounds such as operational definitions. The most
striking applications of the axiomatic approach are upper and lower bounds on operational measures
such as distillable entanglement, entanglement cost and most recently distillable key.
Solution. Let E1, E2, E3 be the events that a bolt selected at random is manufactured by the
machines A, B, C respectively and let E denote the event of its being defective. Now P(E1) = 0.25,
P(E2) = 0.35, P(E3) = 0.40
The probability of drawing a defective bolt manufactured by machine A is P(E/E1) = 0.05.
Similarly we have
P(E/E2) = 0.04, P(E/E3) = 0.02
Hence the probability that a defective bolt selected at random is manufactured by machine A is given
Similarly we get
P(E2/E) =28/69, P(E3/E) =16/69
a) The procedure of testing hypothesis requires a researcher to adopt several steps.
Describe in brief all such steps.
b) A sample of 400 items is taken from a normal population whose mean as well as
variance is 4. If the sample mean is 4.5, can the sample be regarded as a truly random
[a) Hypothesis testing procedure b) Calculation and solution to the problem] 5, 5
Steps for procedure of testing hypothesis
Five Steps in Hypothesis Testing:
Specify the Null Hypothesis
2. Specify the Alternative Hypothesis
3. Set the Significance Level (a)
4. Calculate the Test Statistic and Corresponding P-Value
Drawing a Conclusion
Step 1: Specify the Null Hypothesis
The null hypothesis (H0) is a statement of no effect, relationship, or difference between two or more
groups or factors. In research studies, a researcher is usually interested in disproving the null
Step 2: Specify the Alternative Hypothesis
The alternative hypothesis (H1) is the statement that there is an effect or difference. This is usually the
hypothesis the researcher is interested in proving. The alternative hypothesis can be one-sided (only
provides one direction, e.g., lower) or two-sided. We often use two-sided tests even when our true
hypothesis is one-sided because it requires more evidence against the null hypothesis to accept the
alternative hypothesis.
Step 3: Set the Significance Level (a)
The significance level (denoted by the Greek letter alpha— a) is generally set at 0.05. This means that
there is a 5% chance that you will accept your alternative hypothesis when your null hypothesis is
actually true. The smaller the significance level, the greater the burden of proof needed to reject the
null hypothesis, or in other words, to support the alternative hypothesis.
Step 4: Calculate the Test Statistic and Corresponding P-Value
In another section we present some basic test statistics to evaluate a hypothesis. Hypothesis testing
generally uses a test statistic that compares groups or examines associations between variables. When
describing a single sample without establishing relationships between variables, a confidence interval
is commonly used.
The p-value describes the probability of obtaining a sample statistic as or more extreme by chance
alone if your null hypothesis is true. This p-value is determined based on the result of your test
statistic. Your conclusions about the hypothesis are based on your p-value and your significance
Step 5: Drawing a Conclusion
P-value <= significance level (a) => Reject your null hypothesis in favor of your alternative
hypothesis. Your result is statistically significant.
2. P-value > significance level (a) => Fail to reject your null hypothesis. Your result is not
statistically significant.
Hypothesis testing is not set up so that you can absolutely prove a null hypothesis. Therefore, when
you do not find evidence against the null hypothesis, you fail to reject the null hypothesis. When you
do find strong enough evidence against the null hypothesis, you reject the null hypothesis. Your
conclusions also translate into a statement about your alternative hypothesis. When presenting the
results of a hypothesis test, include the descriptive statistics in your conclusions as well. Report exact
p-values rather than a certain range. For example, "The intubation rate differed significantly by
patient age with younger patients have a lower rate of successful intubation (p=0.02)." Here are two
more examples with the conclusion stated in several different ways.
Reject the null hypothesis in favor of the alternative hypothesis.
The difference in survival between the intervention and control group was statistically
There was a 20% increase in survival for the intervention group compared to control
Step 1
H0:µ=4, H1:µ≠4
step 2
Note: since the sample size is large, normal test is applicable.
Step 3
Since the value of calculated z is greater than even 1% value of tabulated z i.e. 2.58, the null hypothesis
is rejected. The sample cannot be regarded as a truly random sample
Q4. a. What is a Chi-square test? Point out its applications. Under what conditions is
this test applicable? (Meaning of Chi-square test, Applications, Conditions)
b) What are the components of time series? Enumerate the methods of determining
trend in time series. (Components of time series and methods of determining trend in
time series) 6
The Chi-square test is one of the most commonly used non-parametric tests in statistical work. The
Greek Letter
2 is used to denote this test.
2 describe the magnitude of discrepancy between the
observed and the expected frequencies. The value of
2 is calculated as:
Where, O1, O2, O3….On are the observed frequencies and E1, E2, E3…En are the corresponding
expected or theoretical frequencies.
Practical applications of Chi-Square test
In inferential statistics, the Chi-Square test can also be applied for the discrete distributions. The
applications of Chi-Square test include testing:
the significance of sample variances
the goodness of fit of a theoretical distribution
the independence in a contingency table whether the observed results are consistent with the
expected segregations in breeding experiments of genetics
Where the first is a parametric test and the other two are nonparametric test.
Uses of Chi-Square test
2 test is used broadly to:
Test goodness of fit for one way classification or for one variable only
Test independence or interaction for more than one row or column in the form of a
contingency table concerning several attributes
Test population variance ‘
2’ through confidence intervals suggested by
2 test
Conditions for applying the Chi-Square test
1. The frequencies used in Chi-Square test must be absolute and not in relative terms.
2. The total number of observations collected for this test must be large.
3. Each of the observations which make up the sample of this test must be independent of each other.
4. As
2 test is based wholly on sample data, no assumption is made concerning the population
distribution. In other words, it is a non parametric-test.
2 test is wholly dependent on degrees of freedom. As the degrees of freedom increase, the Chi-
Square distribution curve becomes symmetrical.
6. The expected frequency of any item or cell must not be less than 5, the frequencies of adjacent items
or cells should be polled together in order to make it more than 5.
7. The data should be expressed in original units for convenience of comparison and the given
distribution should not be replaced by relative frequencies or proportions.
8. This test is used only for drawing inferences through test of the hypothesis, so it cannot be used for
estimation of parameter value.
Components of Time Series
i) Long term trend or secular trend
ii) Seasonal variations
iii) Cyclic variations
iv) Random variations
Methods of measuring the trend of a time series:
i. Free hand or graphic methods
ii. Semi averages method
iii. Moving average method
iv. Method of least squares
Q5. What do you mean by cost of living index? Discuss the methods of construction of
cost of living index with an example for each. (Meaning of cost of living index, Methods
of constructing cost of living index with an example for each) 2, 8
Answer: Cost of Living Index or Consumer Price Index
The ‘Cost of living index’, also known as ‘consumer price index’ or ‘Cost of living price index’ is the
country’s principal measure of price change. The Consumer price index helps us in determining the
effect of rise and fall in prices on different classes of consumers living in different areas.
Methods of Constructing Consumer Price Index
There are two methods for constructing consumer price index number. They are:
I. Aggregate expenditure method
II. Family budget method or method of weighted average of price relatives.
I. Aggregate Expenditure Method
This method is based on Laspeyre’s method where the base year quantities are taken as weights (w =
II. Family budget method
Family budget method or the method of weighted relatives is the method where weights are the Value
(P0Q0) in the base year often denoted by W.
Q6 a. What is analysis of variance? What are the assumptions of the technique?
b. Three samples below have been obtained from normal populations with equal
variances. Test the hypothesis at 5% level that the population means are equal.
(Meaning of Analysis of Variance, Assumptions, Formulas/Calculation/Solution to the
problem) 2, 1, 7
Analysis of variance (ANOVA)
It is a collection of statistical models used to analyze the differences between group means and their
associated procedures (such as "variation" among and between groups). In ANOVA setting, the
observed variance in a particular variable is partitioned into components attributable to different
sources of variation. In its simplest form, ANOVA provides a statistical test of whether or not the
means of several groups are equal, and therefore generalizes t-test to more than two groups. Doing
multiple two-sample t-tests would result in an increased chance of committing a type I error. For this
reason, ANOVAs are useful in comparing (testing) three or more means (groups or variables) for
statistical significance.
Assumptions for study of ANOVA
The underlying assumptions for the study of ANOVA are:
i) Each of the samples is a simple random sample
ii) Population from which the samples are selected are normally distributed
iii) Each of the samples is independent of the other samples
iv) Each of the population has the same variation and identical means
v) The effect of various components are additive
b) Answer: Let H0: There is no significant difference in the means of three samples
∑ A = 50
∑ B = 40
∑ C = 60
T= Sum of all observations = 150
Correction factor =
150 2
= 1500
SST (Total Sum of the Squares)= Sum of squares of all observations
82 + 72 +122 +102
(ΣA )2
(Σ B )2
+ .......... +142
(Σ C)2
40 2
(Σ N )2
60 2
1600 -1500 = 100
(Σ D )2
50 2
1500 = 1540
Sum of the squares of the Error within columns (samples): SSE = SST – SSC =
100 – 40 = 60
Variance between samples:
= 2 = 20
Variance within the samples:
MSE = (n
k) = (15
The degree of freedom = (k – 1, n – k) = (2, 12).
[ k is the number of columns and n is the total number of observations]