Unit 3: Classification and Tabulation Classification of Data Classification is the first stage in simplification. It can be defined as a systematic grouping of the units according to their common characteristics. Each of the group is called class. For example, in a survey of industrial workers of a particular industry, workers can be classified as unskilled, semi-skilled and skilled, each of which form a class. Objectives of Classifying Data The principal objectives of classifying data are: 1. To condense the mass of data in such a manner that similarities and dissimilarities can be readily apprehended. Millions of figures can thus be arranged in a few classes having common features. 2. To facilitate comparison. 3. To pinpoint the most significant features of the data at a glance. 4. To give prominence to the important information gathered while dropping out the unnecessary elements. 5. To enable a statistical treatment of the material collected. Types of Classification 1. Chronological Classification: Tis meansa orderly arrangement of statistical data according to time, date of happening of an event. If the data are arranged over different periods of time it is called chronological classification. Years Popln 1990 279 1991 279 1992 358 1993 450 1994 455 2. Geographical classification: If the data is classified on basis of geographical or location or area wise it called geographical or spatial classification. Place Literacy UP 25 Bihar 19 Rajasthan Karnataka TN 78 82 65 3. Qualitative Classification: If the data is classified on basis of qualitative characteristics or attributes it is called qualitative classification. Classification of units based on single characteristics is simple one way classification, in two ways it is called dichotomous classification and two or more ways it is called manifold classification. Place Literacy UP 25 Bihar 19 Rajasthan Karnataka TN 78 82 65 4. Quantitative Classification: Classification based on quantitative characteristics or variables is called quantitative classification. Weight No of Students 40-60 25 60-80 19 80-100 78 100-120 82 120-140 65 Tabulation Tabulation follows classification. It is a logical or systematic listing of related data in rows and columns. The row of a table represents the horizontal arrangement of data and column represents the vertical arrangement of data. The presentation of data in tables should be simple, systematic and unambiguous . Objectives of Tabulation The objectives of tabulation are to: 1. 2. 3. 4. 5. 6. Simplify complex data Highlight important characteristics Present data in minimum space Facilitate comparison Bring out trends and tendencies Facilitate further analysis Difference Between Classification and Tabulation Classification It is the basis for tabulation It is the basis for simplification Data is divided into groups and subgroups on the basis of similarities and dissimilarities. Tabulation It is the basis for further analysis It is the basis for presentation Data is listed according to a logical sequence of related characteristics Parts of a Table Tab 1: Table number: Table number is to identify the table for reference. When there are many tables in an analysis, then table numbers are helpful in identifying the tables. Tab 2: Title: Title indicates the scope and the nature of contents in concise form. In other words, title of a table gives information about the data contained in the body of the tabl e. Title should not be lengthy. Tab 3 and Tab 4: Captions: Captions are the headings and subheadings describing the data present in the columns. Tab 5 and Tab 6: Stubs: Stubs are the headings and subheadings of rows. Tab 7: Body of the table: Body of the table contains numerical information. Tab 8: Ruling and Spacing:Ruling and spacing separate columns and rows. However, totals are separated from main body by thick lines. Tab 9: Head Note: Head note is given below the title of the table to indicate the units of measurement of the data and is enclosed in brackets. Tab 10: Source Note: Source note indicates the source from which data is taken. The source note related to table is placed at the bottom on the left hand corner. Frequency Distributions The number of units associated with each value of the variable is called frequency of that valu e. Suppose, the variable takes the value 15 and the value 15 occurs 3 times, then 3 is called the frequency of the value 15. A systematic presentation of the values taken by variable together with corresponding frequencies is called a frequency distribution of the variable. It is presented in tabular form called as frequency table. If class intervals are not present, then it is called a discrete frequency distribution and is displayed in a table. A frequency distribution formed with class-intervals is called a continuous frequency distribution. A continuous frequency distribution is divided into mutually exclusive sub- ranges called class-intervals. Class intervals have lower and upper limits known as lower class limits and upper class limits respectively. The differences between upper class limit and lower class limit is termed as class width. The middle value of a class interval is called mid-value of the class. It is the average of class limits. Discrete frequency distribution Number of Children 0 1 2 3 4 To tal No. of families 15 20 22 16 7 80 Continuous frequency distribution Marks No. of Students 0 – 20 20 – 40 40 – 60 60 – 80 15 20 28 22 80 – 100 Total 15 100 Class intervals are of two types; exclusive and inclusive. The class interval that does not include upper class limit is called an exclusive type of class interval. The class interval that includes the upper class limit is called an inclusive type of class interval. Derived Frequency Distributions From a given frequency distribution, we can form five derived frequency distributions. They are: 1. Relative frequency distribution: If ‘f’ is the class frequency and ‘N’ is the total frequency, the relative frequency distribution is formed by calculating f/N. Total of all the values of relative frequency distribution will always be one. 2. Percentage frequency distribution: The percentage frequency distribution is formed by multiplying the ratio f/N by 100. 3. Frequency density distribution: If ‘c’ is the width of the class-interval and ‘f’ is the frequency of the class, then frequency density distribution is formed by calculating f/c. 4. Less than cumulative frequency distribution. The less than cumulative frequency distribution is formed with number of observations which are less than a given value. 5. More than cumulative frequency distribution: The more than cumulative distribution is formed with number of observations, which are more than a given value . Bivariate and multivariate frequency distribution Frequency distribution of more than two variables is known as multivariate frequency distribution. If the number of variables is only two, then it is called bivariate frequency distribution. A bivariate f requency distribution will have two marginal distributions and “m+n” conditional distributions. Construction Of Frequency Distribution The steps followed to construct frequency distribution table are: 1. Determine the range = Highest value – Lowest value 2. No. of class intervals is given by the Sturge’s Rule that is. K = 1+3.2 log N. where N is the total number of observations. 3. The width of the class interval is given by N/K In practice, divide the range either by 2 or 5 or 10 or multiples of 10 such that the number of class intervals will be between 7 and 15. Avoid open-end class interval. Make sure that class intervals do not overlap