Method of data Collection, processing and Analysis The concept of Sampling A sample is “a smaller (but hopefully representative) collection of units from a population used to determine truths about that population” Why sample? Resources (time, money) and workload Gives results with known accuracy that can be calculated mathematically The sampling frame is the list from which the potential respondents are drawn Registrar’s office Class rosters HR records SAMPLING…… • What is your population of interest? • To whom do you want to generalize your results? • All doctors • School children • Ethiopians • Women aged 15-45 years • Can you sample the entire population? 3 SAMPLING……. • Factors that influence sample representativeness • Sampling procedure • Sample size • Participation (response) • When might you sample the entire population? • When your population is very small • When you have extensive resources • When you don’t expect a very high response • This is called census 4 SAMPLING BREAKDOWN 5 SAMPLING……. STUDY POPULATION SAMPLE TARGET POPULATION 6 Types of Sampling • Probability (Random) Samples • Simple random sample • Systematic random sample • Stratified random sample • Cluster sample • Non-Probability Samples • Convenience sample • Purposive sample • Quota 7 The Sampling Process • The sampling process comprises several stages: • Defining the population of concern • Specifying a sampling frame, a set of items or events possible to measure • Specifying a sampling method for selecting items or events from the frame • Determining the sample size • Implementing the sampling plan • Sampling and data collecting • Reviewing the sampling process 8 Population definition • A population can be defined as including all people or items with the characteristic one wishes to understand. • Because there is very rarely enough time or money to gather information from everyone or everything in a population, the goal becomes finding a representative sample (or subset) of that population. 9 SAMPLING FRAME sampling frame consists of a list of items from which the sample is to be drawn. 10 Types of Sampling PROBABILITY SAMPLING • A probability sampling scheme is one in which every unit in the population has a chance (greater than zero) of being selected in the sample, and this probability can be accurately determined. • When every element in the population does have the same probability of selection, this is known as an 'equal probability of selection design. 11 PROBABILITY SAMPLING……. •Probability sampling includes: • Simple Random Sampling, • Systematic Sampling, • Stratified Random Sampling, • Cluster Sampling 12 NON PROBABILITY SAMPLING • Any sampling method where some elements of population have no chance of selection (these are sometimes referred to as 'out of coverage'/'undercovered’). • where the probability of selection can't be accurately determined. • It involves the selection of elements based on assumptions regarding the population of interest, which forms the criteria for selection. 13 NONPROBABILITY SAMPLING……. • Accidental Sampling • Quota Sampling and • Purposive Sampling. 14 SIMPLE RANDOM SAMPLING • Applicable when population is small, homogeneous & readily available • All subsets of the frame are given an equal probability. Each element of the frame thus has an equal probability of selection. • It provides for greatest number of possible samples. This is done by assigning a number to each unit in the sampling frame. • A table of random number or lottery system is used to determine which units are to be selected. 15 SIMPLE RANDOM SAMPLING…….. • Disadvantages • If sampling frame large, this method impracticable. • Minority subgroups of interest in population may not be present in sample in sufficient numbers for study. 16 SYSTEMATIC SAMPLING • Systematic • Relies on arranging the target population according to some ordering scheme and then selecting elements at regular intervals through that ordered list. • Systematic sampling involves a random start and then proceeds with the selection of every kth element from then onwards. In this case, k=(population size/sample size). 17 SYSTEMATIC SAMPLING…… 18 SYSTEMATIC SAMPLING…… • ADVANTAGES: • Sample easy to select • Suitable sampling frame can be identified easily • Sample evenly spread over entire reference population • DISADVANTAGES: • Sample may be biased if hidden periodicity in population coincides with that of selection. 19 STRATIFIED SAMPLING Where population embraces a number of distinct categories, the frame can be organized into separate "strata." Each stratum is then sampled as an independent sub-population, out of which individual elements can be randomly selected. • Every unit in a stratum has same chance of being selected. • Using same sampling fraction for all strata ensures proportionate representation in the sample. • Adequate representation of minority subgroups of interest can be ensured by stratification & varying sampling fraction between strata as required. 20 STRATIFIED SAMPLING…… • Finally, since each stratum is treated as an independent population, different sampling approaches can be applied to different strata. • Drawbacks to using stratified sampling. • Sampling frame of entire population has to be prepared separately for each stratum • In some cases stratified sampling can potentially require a larger sample than would other methods 21 CLUSTER SAMPLING • Cluster sampling is an example of 'two-stage sampling' . • First stage a sample of areas is chosen; • Second stage a sample of respondents within those areas is selected. • Population divided into clusters of homogeneous units, usually based on geographical contiguity. • Sampling units are groups rather than individuals. • A sample of such clusters is then selected. • All units from the selected clusters are studied. 23 CLUSTER SAMPLING……. • Advantages : • Cuts down on the cost of preparing a sampling frame. • This can reduce travel and other administrative costs. • Disadvantages: sampling error is higher for a simple random sample of same size. 24 MULTISTAGE SAMPLING • Complex form of cluster sampling in which two or more levels of units are embedded one in the other. • First stage, random number of districts chosen in all states. • Followed by random number of villages. • Then third stage units will be houses. • All ultimate units (houses, for instance) selected at last step are surveyed. 27 QUOTA SAMPLING judgment is used to select subjects or units from each segment based on a specified proportion. For example, an interviewer may be told to sample 200 females and 300 males between the age of 45 and 60. In quota sampling the selection of the sample is non-random. For example interviewers might be tempted to interview those who look most helpful. The problem is that these samples may be biased because not everyone gets a chance of selection. 28 CONVENIENCE SAMPLING • Sometimes known as grab or opportunity sampling or accidental or haphazard sampling. • A type of nonprobability sampling which involves the sample being drawn from that part of the population which is close to hand. That is, readily available and convenient. • The researcher using such a sample cannot scientifically make generalizations about the total population from this sample because it would not be representative enough. • For example, if the interviewer was to conduct a survey at a shopping center early in the morning on a given day, the people that he/she could interview would be limited to those given there at that given time, which would not represent the views of other members of society in such an area, if the survey was to be conducted at different times of day and several times per week. Tracer 2011 proposal edited.doc • In social science research, snowball sampling is a similar technique, where existing study subjects are used to recruit more subjects into the sample. 29 Method of data Collection, processing and Analysis TYPES OF DATA • The primary data • are those which are collected afresh and for the first time, and thus happen to be original in character. • The secondary data • on the other hand, are those which have already been collected by someone else and which have already been passed through the statistical process. Collection of Primary Data • There are several methods of collecting primary data, particularly in surveys and descriptive researches. • Observation method • Interview method • Questionnaires • Schedules Measurement and Scaling • Measurement is the process of describing some property of a phenomenon under study and assigning a numerical value to it • For example, in case we are to find the male to female attendance ratio while conducting a study of persons who attend some show, then we may tabulate those who come to the show according to sex. • Measurement is considered as the foundation of scientific inquiry. • In our daily life, many things are measured continuously in ways for different purposes. Measurement and Scaling • The most widely used classification of measurement scales are: • nominal scale • ordinal scale • interval scale; and • ratio scale. Nominal Scale • Nominal scale is simply a system of assigning number symbols to events in order to label them. • These numbers have no quantitative values; they only represent the category. • So we cannot apply any arithmetic operations in this type of sale. • We can only count the number of items in each category. • frequency distribution table for representing this nominal data. Nominal • Nominal scale is the least powerful level of measurement. • It indicates no order or distance relationship and has no arithmetic origin. • A nominal scale simply describes differences between things by assigning them to categories. • Nominal data are, thus, counted data. • In spite of all this, nominal scales are still very useful and are widely used in surveys research when data are being classified by major subgroups of the population. Nominal Scale A. Specify your gender A. Male B. Female B. Are you Married? A. Yes B. No C. You are from A. Urban B. Rural D. Specify your working department A. B. C. D. E. F. Marketing HR Finance Sales Production Operations E. Specify your food habit A. Vegetarian B. No-Vegetarian • Here we can assign number to each option like 1 to Male and 2 to female, and 1 to Yes, and 2 to No, 1 to Urban, 2 to Rural, 1 to Marketing, 2 to HR, 3 to Finance etc. Ordinal Scale • we measure according to the rank order of the data without considering the degree of difference between the data. • Here the “Ordinal” is the indication of “Order”. • In ordinal measurement, we assign a numerical value to the variables based on their relative ranking or positioning in comparison with other data in that group. • An ordinal scale is indicating the logical hierarchy among variables under observation. Example: Ordinal measurement • Suppose in a 100-meter race Tirunesh first, Meseret finished second, Hiwot finished 3 and Ayalnesh finished fourth. • Here we explain the data in ranking scale. We arrange the data according to the relative position of the data set. • we not consider the magnitude of difference between Tirunesh and Meseret, Meseret and Hiwot, Hiwot and Ayalnesh. • Here we do not consider this magnitude of difference, but only the order of the finishing position. Interval Scales • Interval scale • whereas the nominal scale allows us only to qualitatively distinguish groups by categorizing them, and the ordinal scale to rank-order the preferences, the interval scale lets us measure the distance between any two points on the scale. Ratio Scales • Ratio scale represents the actual amounts of variables. • Measures of physical dimensions such as weight, height, distance • E.G: What is your age? • Generally, all statistical techniques are usable with ratio scales and all manipulations that one can carry out with real numbers can also be carried out with ratio scale values. • Multiplication and division can be used with this scale but not with other scales mentioned above. • Thus, proceeding from the nominal scale (the least precise type of scale) to ratio scale (the most precise), relevant information is obtained increasingly. Ratio Scale The differences between scales Processing and Analysis of Data • The data, after collection, has to be processed and analysed in accordance with the outline laid down for the purpose at the time of developing the research plan. • Technically speaking, processing implies: • • • • editing, coding, classification and tabulation of collected data so that they are amenable to analysis. • The term analysis refers to the computation of certain measures along with searching for patterns of relationship that exist among datagroups. Cont’d • Thus, “in the process of analysis, relationships or differences supporting or conflicting with original or new hypotheses should be subjected to statistical tests of significance to determine with what validity data can be said to indicate any conclusions” Cont’d • Editing: • Editing of data is a process of examining the collected raw data (specially in surveys) to detect errors and omissions and to correct these when possible. RM\Tracer 2011 proposal edited.doc • As a matter of fact, editing involves a careful scrutiny of the completed questionnaires. • Editing is done to assure that the data are accurate, consistent with other facts gathered, uniformly entered, as completed as possible and have been well arranged to facilitate coding and tabulation. Cont’d • Coding • Coding refers to the process of assigning numerals or other symbols to answers so that responses can be put into a limited number of categories or classes. • E.g Male = 1 Female =2 Married =1 Unmarried =2 Cont’d • Classification • Most research studies result in a large volume of raw data which must be reduced into homogeneous groups if we are to get meaningful relationships. • This fact necessitates classification of data which happens to be the process of arranging data in groups or classes on the basis of common characteristics. • Data having a common characteristic are placed in one class. Cont’d • (a) Classification according to attributes • Data are classified on the basis of common characteristics which can either be descriptive (such as literacy, sex, honesty, etc.). RM\CrystalViewer (4).pdf • Descriptive characteristics refer to qualitative phenomenon which cannot be measured quantitatively; only their presence or absence in an individual item can be noticed. • Data obtained this way on the basis of certain attributes are known as statistics of attributes and their classification is said to be classification according to attributes Cont’d • B. Classification according to class-intervals • Unlike descriptive characteristics, the numerical characteristics refer to quantitative phenomenon which can be measured through some statistical units. • Data relating to income, production, age, weight, etc. come under this category. • Such data are known as statistics of variables and are classified on the basis of class intervals. • E.g age <17, 18-25, 26-45, 46-60 • Income ETB 2000-5000, 6000-10000 • In this way the entire data may be divided into a number of groups or classes or what are usually called, ‘class-intervals.’ • Each group of class-interval, thus, has an upper limit as well as a lower limit which are known as class limits. Cont’d • Tabulation • When a mass of data has been assembled, it becomes necessary for the researcher to arrange the same in some kind of concise and logical order. • Tabulation is the process of summarising raw data and displaying the same in compact form (i.e., in the form of statistical tables) for further analysis. RM\EJ1235718.pdf • In a broader sense, tabulation is an orderly arrangement of data in columns and rows. • Tabulation is essential because of the following reasons. • 1. It conserves space and reduces explanatory and descriptive statement to a minimum. • 2. It facilitates the process of comparison. • 3. It facilitates the summation of items and the detection of errors and omissions. • 4. It provides a basis for various statistical computations. • Tabulation can be done by hand or by mechanical or electronic devices