ASSIGNMENT DRIVE SPRING 2014 PROGRAM MBADS / MBAHCSN3 / MBAN2 / PGDBAN2 / MBAFLEX SEMESTER I SUBJECT CODE & NAME MB0040 STATISTICS FOR MANAGEMENT Q1. A statistical survey is a scientific process of collection and analysis of numerical data. Explain the stages of statistical survey. Describe the various methods for collecting data in a statistical survey. (Meaning of statistical survey, Stages of statistical survey (Listing and Explanation), Methods for collecting data) 2, 5, 3 Answer: Meaning of Statistical Survey A Statistical Survey is a scientific process of collection and analysis of numerical data. Statistical surveys are used to collect information about units in a population and it involves asking questions to individuals. Surveys of human populations are common in government, health, social science and marketing sectors. Stages of Statistical Survey Statistical surveys involve two stages namely – Planning and Execution. Figure shows the two broad stages of Statistical Survey. 1. Planning a Statistical Survey The relevance and accuracy of data obtained in a survey depends upon the care taken in planning. A properly planned investigation can lead to the best results with least cost and time. Figure gives the explanation of steps involved in the planning stage. 2. Execution of statistical survey Controlled methods should be adopted at every stage of carrying out the investigation to check the accuracy, coverage, methods of measurements, analysis and interpretation. The collected data should be edited, classified, tabulated and presented in the form of diagrams and graphs. The data should be carefully and systematically analysed and interpreted. Methods for collecting data Collection of data is done by a suitable method as per the following: 1. Direct personal observation 2. Indirect oral interview 3. Information through agencies 4. Information through mailed questionnaires 5. Information through a schedule filled by investigators Q2. a) Explain the approaches to define probability. b. In a bolt factory machines A, B, C manufacture 25, 35 and 40 percent of the total output. Of their total output 5, 4 and 2 percent are defective respectively. A bolt is drawn at random and is found to be defective. What are the probabilities that it was manufactured by machines A, B and C? (Applying Bayes theorem and calculating the probabilities) 6 Answer: a. Approaches to define Probability There are four approaches to define Probability. They are as follows: 1) Classical / mathematical / priori approach 2) Statistical / relative frequency / empirical / posteriori approach 3) Subjective approach 4) Axiomatic approach 1) Classical / mathematical / priori approach Under this approach the probability of an event is known before conducting the experiment. In this case, each of possible outcomes is associated with equal probability of occurrence and number of outcomes favourable to the concerned event is known. 2) Statistical / relative frequency / empirical / posteriori approach Under this approach the probability of an event is arrived at after conducting an experiment. If we want to know the probability that a particular household in an area will have two earning members, then we have to gather data on all households in that area and then arrive at the probability. Greater the number of households surveyed, greater will be the accuracy in the probability, arrived. 3) Subjective approach Under this approach the investigator or researcher assigns probability to the events either from his experience or from past records. It is more suitable when the sample size is ten or less than ten. The investigator has full knowledge about the characteristics of each and every individual. However, there is a chance of personal bias being introduced in such probability. 4) Axiomatic approach The aim of the axiomatic approach to entanglement measures is to find, classify and study all functions that capture our intuitive notion of what it means to measure entanglement. The approach sets out axioms, i.e. properties, that an entanglement measure should or should not satisfy. This intuitive notion may be based on more practical grounds such as operational definitions. The most striking applications of the axiomatic approach are upper and lower bounds on operational measures such as distillable entanglement, entanglement cost and most recently distillable key. b. Solution. Let E1, E2, E3 be the events that a bolt selected at random is manufactured by the machines A, B, C respectively and let E denote the event of its being defective. Now P(E1) = 0.25, P(E2) = 0.35, P(E3) = 0.40 The probability of drawing a defective bolt manufactured by machine A is P(E/E1) = 0.05. Similarly we have P(E/E2) = 0.04, P(E/E3) = 0.02 Hence the probability that a defective bolt selected at random is manufactured by machine A is given by Similarly we get P(E2/E) =28/69, P(E3/E) =16/69 Q3. a) The procedure of testing hypothesis requires a researcher to adopt several steps. Describe in brief all such steps. b) A sample of 400 items is taken from a normal population whose mean as well as variance is 4. If the sample mean is 4.5, can the sample be regarded as a truly random Sample? [a) Hypothesis testing procedure b) Calculation and solution to the problem] 5, 5 Answer. a. Steps for procedure of testing hypothesis Five Steps in Hypothesis Testing: 1. Specify the Null Hypothesis 2. Specify the Alternative Hypothesis 3. Set the Significance Level (a) 4. Calculate the Test Statistic and Corresponding P-Value 5. Drawing a Conclusion Step 1: Specify the Null Hypothesis The null hypothesis (H0) is a statement of no effect, relationship, or difference between two or more groups or factors. In research studies, a researcher is usually interested in disproving the null hypothesis. Step 2: Specify the Alternative Hypothesis The alternative hypothesis (H1) is the statement that there is an effect or difference. This is usually the hypothesis the researcher is interested in proving. The alternative hypothesis can be one-sided (only provides one direction, e.g., lower) or two-sided. We often use two-sided tests even when our true hypothesis is one-sided because it requires more evidence against the null hypothesis to accept the alternative hypothesis. Step 3: Set the Significance Level (a) The significance level (denoted by the Greek letter alpha— a) is generally set at 0.05. This means that there is a 5% chance that you will accept your alternative hypothesis when your null hypothesis is actually true. The smaller the significance level, the greater the burden of proof needed to reject the null hypothesis, or in other words, to support the alternative hypothesis. Step 4: Calculate the Test Statistic and Corresponding P-Value In another section we present some basic test statistics to evaluate a hypothesis. Hypothesis testing generally uses a test statistic that compares groups or examines associations between variables. When describing a single sample without establishing relationships between variables, a confidence interval is commonly used. The p-value describes the probability of obtaining a sample statistic as or more extreme by chance alone if your null hypothesis is true. This p-value is determined based on the result of your test statistic. Your conclusions about the hypothesis are based on your p-value and your significance level. Step 5: Drawing a Conclusion 1. P-value <= significance level (a) => Reject your null hypothesis in favor of your alternative hypothesis. Your result is statistically significant. 2. P-value > significance level (a) => Fail to reject your null hypothesis. Your result is not statistically significant. Hypothesis testing is not set up so that you can absolutely prove a null hypothesis. Therefore, when you do not find evidence against the null hypothesis, you fail to reject the null hypothesis. When you do find strong enough evidence against the null hypothesis, you reject the null hypothesis. Your conclusions also translate into a statement about your alternative hypothesis. When presenting the results of a hypothesis test, include the descriptive statistics in your conclusions as well. Report exact p-values rather than a certain range. For example, "The intubation rate differed significantly by patient age with younger patients have a lower rate of successful intubation (p=0.02)." Here are two more examples with the conclusion stated in several different ways. Conclusion: Reject the null hypothesis in favor of the alternative hypothesis. The difference in survival between the intervention and control group was statistically significant. There was a 20% increase in survival for the intervention group compared to control (p=0.001). b. Step 1 H0:µ=4, H1:µ≠4 step 2 z= 4.5−4 𝜎/√𝑛 = 4.5−4 2 10 =5 Note: since the sample size is large, normal test is applicable. Step 3 Since the value of calculated z is greater than even 1% value of tabulated z i.e. 2.58, the null hypothesis is rejected. The sample cannot be regarded as a truly random sample Q4. a. What is a Chi-square test? Point out its applications. Under what conditions is this test applicable? (Meaning of Chi-square test, Applications, Conditions) b) What are the components of time series? Enumerate the methods of determining trend in time series. (Components of time series and methods of determining trend in time series) 6 Answer: a. The Chi-square test is one of the most commonly used non-parametric tests in statistical work. The Greek Letter 2 is used to denote this test. 2 describe the magnitude of discrepancy between the observed and the expected frequencies. The value of 2 is calculated as: Where, O1, O2, O3….On are the observed frequencies and E1, E2, E3…En are the corresponding expected or theoretical frequencies. Practical applications of Chi-Square test In inferential statistics, the Chi-Square test can also be applied for the discrete distributions. The applications of Chi-Square test include testing: the significance of sample variances the goodness of fit of a theoretical distribution the independence in a contingency table whether the observed results are consistent with the expected segregations in breeding experiments of genetics Where the first is a parametric test and the other two are nonparametric test. Uses of Chi-Square test The 2 test is used broadly to: Test goodness of fit for one way classification or for one variable only Test independence or interaction for more than one row or column in the form of a contingency table concerning several attributes Test population variance ‘ 2’ through confidence intervals suggested by 2 test Conditions for applying the Chi-Square test 1. The frequencies used in Chi-Square test must be absolute and not in relative terms. 2. The total number of observations collected for this test must be large. 3. Each of the observations which make up the sample of this test must be independent of each other. 4. As 2 test is based wholly on sample data, no assumption is made concerning the population distribution. In other words, it is a non parametric-test. 5. 2 test is wholly dependent on degrees of freedom. As the degrees of freedom increase, the Chi- Square distribution curve becomes symmetrical. 6. The expected frequency of any item or cell must not be less than 5, the frequencies of adjacent items or cells should be polled together in order to make it more than 5. 7. The data should be expressed in original units for convenience of comparison and the given distribution should not be replaced by relative frequencies or proportions. 8. This test is used only for drawing inferences through test of the hypothesis, so it cannot be used for estimation of parameter value. b. Components of Time Series i) Long term trend or secular trend ii) Seasonal variations iii) Cyclic variations iv) Random variations Methods of measuring the trend of a time series: i. Free hand or graphic methods ii. Semi averages method iii. Moving average method iv. Method of least squares Q5. What do you mean by cost of living index? Discuss the methods of construction of cost of living index with an example for each. (Meaning of cost of living index, Methods of constructing cost of living index with an example for each) 2, 8 Answer: Cost of Living Index or Consumer Price Index The ‘Cost of living index’, also known as ‘consumer price index’ or ‘Cost of living price index’ is the country’s principal measure of price change. The Consumer price index helps us in determining the effect of rise and fall in prices on different classes of consumers living in different areas. Methods of Constructing Consumer Price Index There are two methods for constructing consumer price index number. They are: I. Aggregate expenditure method II. Family budget method or method of weighted average of price relatives. I. Aggregate Expenditure Method This method is based on Laspeyre’s method where the base year quantities are taken as weights (w = Q0). II. Family budget method Family budget method or the method of weighted relatives is the method where weights are the Value (P0Q0) in the base year often denoted by W. Q6 a. What is analysis of variance? What are the assumptions of the technique? b. Three samples below have been obtained from normal populations with equal variances. Test the hypothesis at 5% level that the population means are equal. (Meaning of Analysis of Variance, Assumptions, Formulas/Calculation/Solution to the problem) 2, 1, 7 Answer: Analysis of variance (ANOVA) It is a collection of statistical models used to analyze the differences between group means and their associated procedures (such as "variation" among and between groups). In ANOVA setting, the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether or not the means of several groups are equal, and therefore generalizes t-test to more than two groups. Doing multiple two-sample t-tests would result in an increased chance of committing a type I error. For this reason, ANOVAs are useful in comparing (testing) three or more means (groups or variables) for statistical significance. Assumptions for study of ANOVA The underlying assumptions for the study of ANOVA are: i) Each of the samples is a simple random sample ii) Population from which the samples are selected are normally distributed iii) Each of the samples is independent of the other samples iv) Each of the population has the same variation and identical means v) The effect of various components are additive b) Answer: Let H0: There is no significant difference in the means of three samples A B C 8 7 12 10 5 9 7 10 13 14 9 12 11 9 14 ∑ A = 50 ∑ B = 40 ∑ C = 60 T= Sum of all observations = 150 T2 Correction factor = 150 2 = = 1500 N 15 T SST (Total Sum of the Squares)= Sum of squares of all observations = 82 + 72 +122 +102 (ΣA )2 (Σ B )2 SSC = + + .......... +142 (Σ C)2 + n1 = 40 2 + 5 n3 (Σ N )2 T2 nn N +,,,,,,+ n4 60 2 + 5 N 1600 -1500 = 100 (Σ D )2 + n2 50 2 1500 2 1500 = 1540 1500 40 5 Sum of the squares of the Error within columns (samples): SSE = SST – SSC = 100 – 40 = 60 Variance between samples: SSC 40 k MSC = 1 40 3 = 1 = 2 = 20 Variance within the samples: SSE MSE = (n k) = (15 60 3) =5 The degree of freedom = (k – 1, n – k) = (2, 12). [ k is the number of columns and n is the total number of observations]