Professor Vipin 2015 Classification and Tabulation of Data Collection of Data – Census and Sampling A census is a study of every unit, everyone or everything, in a population. It is known as a complete enumeration, which means a complete count. A sample is a subset of units in a population, selected to represent all units in a population of interest. It is a partial enumeration because it is a count from part of the population. Information from the sampled units is used to estimate the characteristics for the entire population of interest. Sampling Techniques A sample must be robust in its design and large enough to provide a reliable representation of the whole population. Aspects to be considered when designing a sample include the level of accuracy required, cost, and the timing. Sampling can be random or non-random. 1. In a random (or probability) sample each unit in the population has a chance of being selected, and this probability can be accurately determined. 2. Probability or random sampling includes, but is not limited to, simple random sampling, systematic sampling, and stratified sampling. Random sampling makes it possible to produce population estimates from the data obtained from the units included in the sample. 3. Simple random sample: All members of the sample are chosen at random and have the same chance of being in the sample. A lottery draw is a good example of simple random sampling where the numbers are randomly generated from a defined range of numbers (i.e. 1 through to 45) with each number having an equal chance of being selected. 4. Systematic random sample: The first member of the sample is chosen at random then the other members of the sample are taken at intervals (i.e. every 4th unit). 5. Stratified random sample: Relevant subgroups from within the population are identified and random samples are selected from within each strata. 6. In a non-random (or non-probability) sample some units of the population have no chance of selection, the selection is non-random, or the probability of their selection can not be determined. 7. In this method the sampling error cannot be estimated, making it difficult to infer population estimates from the sample. Non-random sampling includes convenience sampling, purposive sampling, quota sampling, and volunteer sampling 8. Convenience sampling: Units are chosen based on their ease of access; 9. Purposive sampling: The sample is chosen based on what the researcher thinks is appropriate for the study; 10. Quota sampling: The researcher can select units as they choose, as long as they reach a defined quota; and www.VipinMKS.com Page 1 Professor Vipin 2015 11. Volunteer sampling: participants volunteer to be a part of the survey (a common method used for internet based opinion surveys where there is no control over how many or who votes). Classification of Data 1. Quantitative data are those that can be quantified in definite units of measurement. These refer to characteristics whose successive measurements yield quantifiable observations. Depending on the nature of the variable observed for measurement, quantitative data can be further categorized as continuous and discrete data. a) Continuous data represent the numerical values of a continuous variable. A continuous variable is the one that can assume any value between any two points on a line segment, thus representing an interval of values. The values are quite precise and close to each other, yet distinguishably different. All characteristics such as weight, length, height, thickness, velocity, temperature, tensile strength, etc., represent continuous variables. Thus, the data recorded on these and similar other characteristics are called continuous data. It may be noted that a continuous variable assumes the finest unit of measurement. Finest in the sense that it enables measurements to the maximum degree of precision. b) Discrete data are the values assumed by a discrete variable. A discrete variable is the one whose outcomes are measured in fixed numbers. Such data are essentially count data. These are derived from a process of counting, such as the number of items possessing or not possessing a certain characteristic. The number of customers visiting a departmental store every day, the incoming flights at an airport, and the defective items in a consignment received for sale, are all examples of discrete data. 2. Qualitative data refer to qualitative characteristics of a subject or an object. A characteristic is qualitative in nature when its observations are defined and noted in terms of the presence or absence of a certain attribute in discrete numbers. These data are further classified as nominal and rank data. a) Nominal data are the outcome of classification into two or more categories of items or units comprising a sample or a population according to some quality characteristic. Given any such basis of classification, it is always possible to assign each item to a particular class and make a summation of items belonging to each class. The count data so obtained are called nominal data. b) Rank data, on the other hand, are the result of assigning ranks to specify order in terms of the integers 1,2,3, ..., n. Ranks may be assigned according to the level of performance in test, a contest, a competition, an interview, or a show. The candidates appearing in an interview, for example, may be assigned ranks in integers, depending on their performance in the interview. Ranks so assigned can be viewed as the continuous values of a variable involving performance as the quality characteristic. www.VipinMKS.com Page 2 Professor Vipin 2015 3. Data sources could be seen as of two types, viz., secondary and primary. a) Secondary data: They already exist in some form: published or unpublished in an identifiable secondary source. They are, generally, available from published source(s), though not necessarily in the form actually required. b) Primary data: Those data which do not already exist in any form, and thus have to be collected for the first time from the primary source(s). By their very nature, these data require fresh and first-time collection covering the whole population or a sample drawn from it. Preparing a Frequency Distribution Table Frequency is how often something occurs. By counting frequencies we can make a Frequency Distribution table. (1) Find the range of the data: The range is the difference between the largest and the smallest values. (2) Decide the approximate number of classes: Which the data are to be grouped. There are no hard and first rules for number of classes. Most of the cases we have 5 to 20 classes. (3) Determine the approximate class interval size: The size of class interval is obtained by dividing the range of data by number of classes and denoted by class interval size. In case of fractional results, the next higher whole number is taken as the size of the class interval. (4) Decide the starting point: The lower class limits or class boundary should cover the smallest value in the raw data. It is a multiple of class interval. (5) Determine the remaining class limits (boundary): When the lowest class boundary of the lowest class has been decided, then by adding the class interval size to the lower class boundary, compute the upper class boundary. The remaining lower and upper class limits may be determined by adding the class interval size repeatedly till the largest value of the data is observed in the class. (6) Distribute the data into respective classes: All the observations are marked into respective classes by using Tally Bars (Tally Marks) methods which is suitable for tabulating the observations into respective classes. The number of tally bars is counted to get the frequency against each class. The frequency of all the classes is noted to get grouped data or frequency distribution of the data. The total of the frequency columns must be equal to the number of observations. Tabulation of Data The process of placing classified data into tabular form is known as tabulation. A table is a symmetric arrangement of statistical data in rows and columns. Rows are horizontal arrangements whereas columns are vertical arrangements. It may be simple, double or complex depending upon the type of classification. www.VipinMKS.com Page 3 Professor Vipin 2015 (1) Simple Tabulation or One-way Tabulation: When the data are tabulated to one characteristic, it is said to be simple tabulation or one-way tabulation. For Example: Tabulation of data on population of world classified by one characteristic like Religion is example of simple tabulation. (2) Double Tabulation or Two-way Tabulation: When the data are tabulated according to two characteristics at a time. It is said to be double tabulation or two-way tabulation. For Example: Tabulation of data on population of world classified by two characteristics like Religion and Sex is example of double tabulation. (3) Complex Tabulation: When the data are tabulated according to many characteristics, it is said to be complex tabulation. For Example: Tabulation of data on population of world classified by two characteristics like Religion, Sex and Literacy etc…is example of complex tabulation. www.VipinMKS.com Page 4