Unit 2: Organization of Data Statistical Enquiry A statistical inquiry is a process of transforming raw data into useful information that can tell us more about a subject and allow us to make recommendations and possibly make predictions of future outcomes. It consists of two stages: 1. Planning and Preperation 1. 2. 3. 4. 5. 6. 7. 8. Object of enquiry Scope of the enquiry Units used for collection and measurement Sources of data Method of collection of data Framing a format Accuracy level Type of equiry 2. Execution and Survey a) b) c) d) e) f) g) Setting a team of administrators Designing of questionnaire Selection and training of enumerators Field work by enumerators and supervision Follow up work in the case of non response Analysis of collected data Preperation of final report Collecting Data Collection of data is the first and most important stage in any Statistical Survey. The method for collection of data depends upon various considerations such as objective, scope, nature of investigation and availability of resources. Direct personal interviews, third party agencies, and questionnaires are some ways through which data is collected. Primary data Data collected for the first time keeping in view the objective of the survey is known as primary data. They are likely to be more reliable. However, cost of collection of such data is much higher. Primary data is collected by the census method. In other words, information with respect to each and every individual of the population is observed. Collection of primary data can be done by any of the following methods. a) b) c) d) e) Direct personal observation Indirect oral interview Information through agencies Information through mailed questionnaires Information through schedule filled by investigators Direct personal observation In the direct personal observation method, as illustrated in figure 2.4, the investigator collects data by having direct contact with units of investigation. The accuracy of data depends upon the ability, training and attitude of the investigator. Merits 1. We get the original data which is more accurate and reliable. Demerits 1. This method consumes more cost. 2. Satisfactory information can be extracted by the investigator through indirect questions. 2. This method consumes more time. 3. Data is homogeneous and comparable. 3. This method cannot be used when the scope of investigation is wide. Indirect Oral Interview Indirect oral interview is used when the area to be covered is large. The investigator collects the data from a third party or witness or head of institution. This method is generally used by police department in cases related to enquiries on causes of fires, thefts or murders. Merits 1. Economical in terms of time, cost and man power 2. Confidential information can be collected, 3. Information is likely to be unbiased and reliable Demerits 1. The de gree of accuracy of information is less. Collecting Information Through Agencies Methods of collecting information through local agencies or correspondents are generally adopted by newspaper and television channels. Local agents are appointed in different parts of the area under investigation. Merits Very cheap and economical Useful where information is needed regularly Demerits Information may be biased It is difficult to maintain the degree of accuracy and uniformity Through mailed questionnaires Often, information is collected through questionnaires. The questionnaires are filled with questions pertaining to the investigation. They are sent to the respondents with a covering letter soliciting cooperation from the respondents (respondents are the people who respond to questions in the questionnaire). Merits Most economical Saves manpower Can be widely used Demerits Cannot be used if informants are illiterates Many informants will not respond In case of non-response, follow up work is essential. Information through schedule filled by investigators Information can be collected through schedules filled by investigators through person al contact. In order to get reliable information, the investigator should be well trained, tactful, unbiased and hard working. Merits Useful when informants are illiterates Rate of non-responses is less Demerits Training of investigators is essentials Time consuming Personal bias of investigators may lead to failure of enquiry. Secondary Data Any information, that is used for the current investigation but is obtained from some data, which has been collected and used by some other agency or person in a separate investigation, or survey, is known a secondary data. They are available in published or unpublished form. The various sources of published data are: 1. Reports and official publications of international and national organizations as well as central and state governments 2. Publications of several local bodies such as municipal corporations and district boards 3. Financial and economic journals 4. Annual reports of various companies 5. Publications brought out by research agencies and research scholars Questionnaire A questionnaire is a research instrument consisting of a series of questions and other prompts for the purpose of gathering information from respondents. Although they are often desi gned for statistical analysis of the responses, this is not always the case. Guidelines for Construction of a Questionnaire The following principles are to be considered: 1. 2. 3. 4. 5. 6. 7. 8. 9. Number of questions should be as less as possible Questions must be simple to understand Questions should be arranged logically Answers to questions must be short. As far as possible questions on personal matters must be avoided Any clarifications on questions must be provided in the footnote Necessary instructions must be given to informants Questionnaire must be attractive. Information supplied must be kept confidential. Census A census is the procedure of systematically acquiring and recording information about the members of a given population. It is a regularly occurring and official count of a particular population. Merits Results are accurate and reliable Demerits Non sampling errors are likely to be more Data are collected from each and every unit of Requires money, labor and time the population Provides detailed study of all units in population It isn’t possible in some circumstances when population is vast Free from sampling errors While procuring data, if units are damaged census enumeration is not suitable Sample Survey In statistics, survey sampling describes the process of selecting a sample of elements from a target population to conduct a survey. The term "survey" may refer to many different types or techniques of observation. In survey sampling it most often involves a questionnaire used to measure the characteristics and/or attitudes of people. Merits Requires less labor, time and is economical Demerits Requires adoption of appropriate sampling methods and appropriate analysis If population is too heterogenous in nature, Sample survey is more scientific use of sampling procedure is impossible Applied for units which are destructive in natureSampling errors are always there Free from non-sampling errors Difference between Census and Sample Survey Census Only few units of the population studied. Sample Survey Each and every unit of the population is studied. Relatively less amount of finance, till labour is Requires large amount of finance, time and labour. required. Results are quite reliable. Results are less reliable. It is more suitable if population homogeneous in It is more suitable if population is heterogeneous in nature. nature It can be used, if part of the population is missing. It cannot be used when part of the population is missing. The following are some of the methods of sampling: Simple Random Sampling In a simple random sample (SRS) of a given size, all such subsets of the frame are given an equal probability. Furthermore, any given pair of elements has the same chance of selection as any other such pair (and similarly for triples, and so on). This minimizes bias and simplifies analysis of results. In particular, the variance between individual results within the sample is a good indicator of variance in the overall population, which makes it relatively easy to estimate the accuracy of results. Systematic Sampling Systematic sampling (also known as interval sampling) relies on arranging the study population according to some ordering scheme and then selecting elements at regular intervals through that ordered list. Systematic sampling involves a random start and then proceeds with the selection of every kth element from then onwards. In this case, k=(population size/sample size). It is important that the starting point is not automatically the first in the list, but is instead randomly chosen from within the first to the kth element in the list. A simple example would be to select every 10th name from the telephone directory (an 'every 10th' sample, also referred to as 'sampling with a skip of 10'). Stratified Sampling Where the population embraces a number of distinct categories, the frame can be organized by these categories into separate "strata." Each stratum is then sampled as an independent sub-population, out of which individual elements can be randomly selected. There are several potential benefits to stratified sampling. Statistical Error (Sampling Errors) It is the difference between estimated value and actual value Causes of Errors 1. Error of origin: Due to improper definition of statistical units. 2. Error of inadequacy: due to incomplete data 3. Error of manipulation: Error that occurs during analysis Biased and Unbiased Errors Errors which occur with the notice of the investigator are called biased errors. They are prejudiced errors Errors which occur without the notice of the investigator are called unbiased errors. Due to chance these cannot be controlled. Measurement of Errors There are two types of measurements: 1. Absolute Error: It is the arithmetic difference between actual value and estimated value: Absolute Error = Actual Value – Estimated Value or AE = a – e 2. Relative error: it is the ratio of absolute error and estimated value.