STATISTICS INTRODUCTION AND DEFINITIONS Dr Faris Al Lami MB,ChB PhD Learning Objectives By the end of this lecture the student should be able to: 1. Define Biostatistics 2. Identify advantages, applications and purposes of Biostatistics 3. Define types and scales of variables 4. Make grouped data STATISTICS A field of study concerned with methods and procedures of: - Collection, Organization, Classification & Summarization of data. (Descriptive Statistics) - Analysis, and Drawing of inferences about a body of data when only a part of data are observed. ( Analytic Statistics) BIOSTATISTICS When the data being analyzed are derived from biological and medical sciences, the term “ Biostatistics” is used. ADVANTAGES 1. Carrying out a research Statistical analysis should be considered in the planning phase of the study 2. Evaluating published articles Statistical errors are common in clinical researches that may invalidate the conclusion. ADVANTAGES 3. Ethical consideration It is unethical to use erroneous statistics especially in scientific publications. Using harmful or ineffective treatment or avoidance of useful treatment can occur if the statistics is wrong. 4. Professional and personal satisfaction PURPOSES 1. Data reduction By condensing data to manageable proportions thus facilitating interpretation 2. Evaluate role of chance To see if the effect of a certain event is a real one or arise from chance fluctuation because of the sample of subjects 3. Sampling and generalization What proportion of discharged patients required readmission? What are their characteristics? The answer required generalization of the sample's result. APPLICATIONS Ø Are the differences between groups significant? Ø Are these two measures related or associated? Ø Can one predict the value of one variable (outcome) from knowledge of the values of other variables? VARIABLE A characteristic that takes on different values in different persons, places, or things. eg. - Heights of adult males. - Weights of preschool children. - Ages of patients seen in a dental clinic. Variables Quantitative Variables The variable that can be measured in the usual sense of measurement as age , weight, height,… Qualitative Variables It is the variable that can not be measured in the usual sense but can be described or categorized ..Socio-economic QUALITATIVE VARIABLE v. eg.; - socio-economic groups. - ill person with medical diagnosis. - object is said to possess or not possess some characteristic of interest. v In this case we count the number of individuals falling into each category as the socioeconomic status, diagnostic category,… Quantitative Variables DISCRETE VARIABLE It is characterized by gaps or interruptions in the values that it can assume. CONTINOUS VARIABLE It does not posses the gaps or interruption, It can assume any value within a specified interval of values assumed by any variable - The number of daily admissions -The number of decayed, missing or filled teeth per child -Weight, -Height, -Mid-arm circumference VARIABLES SCALE 1. NOMINAL SCALE It uses names, numbers or other symbols. Each measurement assigned to a limited number of unordered categories and fall in only one category. eg. males & females 2. ORDINAL SCALE • Each measurement is assigned to one of a limited number of categories that are ranked in a graded order. ( 1st, 2nd, 3rd..) • Differences among categories are not necessary equal and often not measurable. VARIABLES SCALE 3. INTERVAL SCALE Each measurement is assigned to one of unlimited categories that are equally spaced with NO true zero point. 4. RATIO SCALE Measurement begins at a true zero point and the scale has equal intervals POPULATION • POPULATION OF ENTITIES Largest collection of entities that had common characteristics for which we have an interest at a particular time. • POPULATION OF VARIABLES It is the largest collection of values of a random variable for which we have an interest at a particular time. SAMPLE • It is part or subset of the population Sample of entities: which is a subset of population of entities Sample of variables: which is subset of population of variables GROUPED DATA To group a set of observations, we select a set of contiguous, non overlapping intervals, such that each value in the set of observation can be placed in one, and only one, of the interval, and no single observation should be missed. The interval is called: CLASS INTEVAL. NUMBER OF CLASS INTERVALS The number of class intervals : • Should not be too few because of the loss of important information. and • Not too many because of the loss of the needed summarization . When there is a priori classification of that particular observation we can follow that classification ( annual tabulations), but when there is no such classification we can follow the Sturge's Rule NUMBER OF CLASS INTERVALS Sturge's Rule: k=1+3.322 log n • k= number of class intervals • n= number of observations in the set • The result should not be regarded as final, modification is possible WIDTH OF CLASS INTERVAL The width of the class intervals should be the same, if possible. R W = -------K W= Width of the class interval R= Range (largest value – smallest value) K= Number of class intervals FREQUENCY DISTRIBUTION It determines the number of observations falling into each class interval Fasting blood glucose levels Frequency < 60 60-62 63-65 66-68 69-71 72+ 10 23 33 22 34 33 155 RELATIVE FREQUENCY DISTRIBUTION • It determines the proportion of observation in the particular class interval relative to the total observations in the set. Fasting blood glucose levels Frequency Relative frequency % < 60 10 6.45 60-62 23 14.84 63-65 33 21.29 66-68 22 14.19 69-71 34 21.94 72+ 33 21.29 155 100 CUMULATIVE FREQUENCY DISTRIBUTION • This is calculated by adding the number of observation in each class interval to the number of observations in the class interval above, starting from the second class interval onward. Fasting blood glucose levels < 60 60-62 63-65 66-68 69-71 72+ Frequency Cumulative frequency distribution 10 23 33 22 34 33 155 10 33 66 88 122 155 CUMULATIVE RELATIVE FREQUENCY DISTRIBUTION This calculated by adding the relative frequency in each class interval to the relative frequency in the class interval above, starting also from the second class interval onward. Fasting blood glucose levels F Cumulative frequency distribution Relative frequency % < 60 10 23 33 22 34 33 10 6.45 14.84 21.29 14.19 21.94 21.29 60-62 63-65 66-68 69-71 72+ 155 33 66 88 122 155 100 Cumulative relative frequency distribution 6.45 21.29 42.58 56.77 78.71 100.00 CUMULATIVE DISTRIBUTION • Cumulative frequency and cumulative relative frequency distributions are used to facilitate obtaining information regarding the frequency or relative frequency within two or more contagious class intervals. EXERCISE • The followings are the weights (Kg) of 45 adult male individuals attending a primary health care centers: 76 86 70 85 66 55 73 49 79 56 62 73 88 90 41 65 69 58 99 63 77 72 68 55 54 78 77 59 64 68 71 47 73 85 66 52 72 63 65 48 83 90 80 85 71 1 2 3 4 5 6 7 8 9 76 55 62 73 88 90 41 52 72 10 11 12 13 14 15 16 17 18 86 73 65 69 58 99 63 63 65 19 20 21 22 23 24 25 26 27 70 49 77 72 68 55 54 48 83 28 29 30 31 32 33 34 35 36 85 79 78 77 59 64 68 90 80 37 38 39 40 41 42 43 44 45 66 56 71 47 73 85 66 85 71 EXERCISE • Construct a table showing: Ø Frequency Ø Relative frequency Ø Cumulative frequency Ø Cumulative relative frequency distribution. Number of class intervals: K=1+3.322 log n =1+3.322 log45 =1+3.322 X 1.653 =6.4 =6 Width of class interval: R 99-41 W= ------ = ------- = 9.7 = 10 K 6 CLASS INTERVAL (Kg) 40-49 50-59 60-69 70-79 80-89 90-99 Total FREQUENCY 4 7 11 13 7 3 45 RELATIVE FREQUENCY % 8.9 15.6 24.4 28.9 15.6 6.7 100.1 CUMULATIVE FREQUENCY CUM.REL. FREQUENCY % 4 11 22 35 42 45 8.9 24.5 48.9 77.8 93.4 100.1 EXERCISE • The following are the number of babies born during a year in 60 public hospitals 1 30 11 27 21 56 31 45 41 32 2 37 12 52 22 54 32 32 42 35 52 24 3 32 13 40 23 53 33 29 43 42 53 53 4 39 14 59 24 49 34 30 44 21 54 28 5 52 15 43 25 54 35 22 45 24 55 57 6 55 16 45 26 48 36 49 46 57 56 56 7 55 17 34 27 42 37 59 47 46 57 57 8 26 18 28 28 54 38 42 48 54 58 59 9 56 19 58 29 53 39 53 49 34 59 50 10 57 20 46 30 31 40 31 50 24 60 29 51 47 EXERCISE • Construct a table showing : ØFrequency ØRelative frequency ØCumulative frequency ØCumulative relative frequency Exercise • For the following data construct a table showing age and gender distribution. Age Group Number of cases by Gender Male Female Total 0-9 0 2 2 10-19 5 1 6 20-29 7 4 11 30-39 6 4 10 40-49 2 2 4 50+ 0 1 1 Total 20 14 34 • Complete the table showing the relative frequency distribution Thanks