Chapter I Basic Concept in Statistics Introduction: Statistics affects many facets of our lives. In every life, whether at home or at work, we usually keep records and read reports. An item is a record, or report is a fact that expressed in terms of a numerical value or described by its quality or kind. The ingle item or fact is referred to as a datum, such as color of the leaves, the number of students in the class, the height and width and the number of bacterial colonies are all example of data. And how to deal with it is the major concern of statistics. Objectives: 1. Define biostatistics and identify its importance. 2. Explain the methods of collecting statistical data and variables. 3. Discuss different sampling techniques Lesson 1.2. Biotatistics and its Importance Overview: For most people the word “statistics” is a scary thing that must be avoided as much as possible. They think of statistics as collection of numbers and formulas that have vague meanings. Actually, without noticing it, people often apply statistics in their everyday life. When a clinician records the result of a physical examination of the patient, he is collecting data to aid the physician in diagnosing the patients’ illness and to determine the appropriate medical treatment to be prescribed to the patient. Objectives: The students should be able to: 1. Define statistics and biostatistics 2. Discuss the inductive and deductive reasoning in medical diagnoses. 3. Explain the scientific methods employ in medical research. Content: Statistics is a science that deals with the collection, organization, analysis, interpretation and presentation information that can be stated numerically. Major areas of Statistics: 1. Descriptive Statistics- this includes anything done to the data which is designed to summarize or describe, without going any further; that is without attempting to infer anything that goes beyond the data themselves. 2. Statistical Inference- comprises the methods concerned with the analysis of a subset of data leading to predictions or inferences about he entire set of data.. analysis requires the generalization which go beyond the data. Biostatistics Is statistics applied to the biological sciences. Perhaps the most difficult of statistics is the logic associated with inductive Inferences, yet all scientific evidence is based on this type of statistical inference. The same logic is used, though not always explicitly, when a physician practices medicine: what is observed for a large group of patients to make a specific decision about that particular patient. When taking a clinical history, conducting a physical examination, or requesting laboratory analyses, radiographic evaluations or test, a physician is collecting information (data) to help choose diagnostic and therapeutic actions. The decisions reached are based on knowledge obtained from training, from literature, from experience, or from some similar sources. General principles are applied to specific situation at hand in order to reach the best decision possible for a particular patient. Much of the basic medical training centers around deductive reasoning This type of reasoning- from the general to the specific- is called deductive reasoning. We conduct experiments and comparative studies to focus on questions that arise from our work. We study few patients ( or experimental animals), and from what we observe we try to make rational inferences about what happens in general. This type of reasoning- from the specific subject(s) at hand to general. This type of reasoning is called Inductive Reasoning. This approach to medical research- pushing back the bounds of knowledge concerning human health- follows what is known as the Scientific Method, which has four basic steps. 1. Making observation………i.e.., gathering data 2. Generating a hypothesis………the underlying law and order suggested by the data 3. Deciding how to test the hypothesis……what critical data required? 4. Experimenting ( or observing) –this leads to an inference that either to rejects or affirms the hypothesis. If the hypothesis is rejected, then we go back to step 2. If it is affirmed, this does not necessary mean it is true, only that in the light of current knowledge and methods it appears to be so. The hypothesis is constantly refined and tested as more knowledge becomes available. All data collected from biological system have variability, the statistician is concerned with summarizing trends in data and drawing conclusions in spite of the uncertainty by variability in the data. An understanding of statistics will enhance your ability to interpret data, whether for the purpose of treating a particular patient or for drawing general conclusions from a research study, as well as enable you to distinguish fact from fancy in everyday life. Summary: Biostatistics deals with the collection, organization, presentation, analysis and interpretation of biological information that can be stated numerically. Activity: A. Suppose that a set of measurement represent the total rainfall in the province of Sultan Kudarat during the month of July has been recorded for the past 15 years. Any values describing the data. Write descriptive or Inferential statistics in the following value based on the data above. 1. The average rainfall within 15 yrs is 3.0 cm. 2. For 15 years , Month of July have rain. 3. Next July we expect a rain. 4. This July 2021 we will expect between 3.2 and 3.4 cm of rain. B. 1.Decide what reasoning must be employ in the situation below in order to give diagnoses and treatment. Discuss why? a. Stroke patient b. Yellowing of leaves of your potted plant. c. Swelling of gums and painful tooth. C. Differentiate the following: 1. Statistics and Biostatistics 2. Deductive and Inductive reasoning 3. Descriptive and Inferential statistics Study 1.2. Statistical data and Variables Overview: The basic unit of statistical analysis is data. There are generally two types of data and there is no formula for selecting the best method to be used in gathering data. It depends on the researcher’s design of the study, the type of data, the time available to complete the study, and the financial capacity. Objectives: 1. Identify the types and kinds of data. 2. Explain the methods in collecting data. 3. Discuss the types of variables. 4. Determine the scales of measurement. Content: Classification of Data 1. Quantitative Data-data that can be expressed in numbers. These are the things that can be measured like weight, length, number of colonies, mortality rate and etc. 2. Qualitative Data- are facts for which no numerical measure exists. They are usually expressed in categories or kind. Example are color of the skin, which could be black, brown or white; a person’s sex, which is male or female; It may be presense or absence of metallic sheen in the colony of the bacteria; and others In order to assure the accuracy of data, one must know the right sources and methods of collecting them. Types of data according to sources 1. Primary Data- it refers to the information which are gathered directly from an original source, or which are based on the direct of first hand experiences. 2. Secondary Data- refer to the information which are taken from published data which are previously gathered by other individuals or agencies or data which comes from other sources other than the respondents. Methods of Collecting Data. 1. Interview Method- person to person exchange between the interviewer and interviewee. 2. Questionnaire Method- written response are given to prepared questions. A questionnaire is a list of questions which are intended to elicit answers to the problem of a study. Questionnaire may be mailed, send online or hand carried. 3. Registration Method- method of gathering information is enforced by certain laws. Examples are the registration of birth, deaths, motor vehicles, marriages and licenses. 4. Observation method- the investigator observes the behavior of persons or organisms and their outcomes. This is usually used when the subjects can not talk and write. 5. Experimental Method- this method is used when the objective is to determine the cause and effect relationship of certain phenomena under controlled condition. Scientific researchers usually use the experimental method. Collected data must be organized in order to show significant characteristics. They can be presented in three forms 1. Textual- when data is presented in paragraph 2. Tabular – data is presented in rows and columns. 3. Graphical – data is presented in visual form. Kinds of graphs a.Bar graph b. Pie graph c. Line graph Variable is a numerical characteristic or attribute associated with the population being studied. Types of Variables 1. Categorical or qualitative variables are classified according to some attributes or categories Ex. Gender, religion, blood type, civil status… Categories may be ordered which may or may not assigned specific numerical values such as: Performance Rating ( poor, fair, good, very good, excellent). IQ score ( low, average, high) 2. Numerical – valued or quantitative variables are variables that are classified according to numerical characteristics such as height, age, pulse rate, number of children, speed.Numericalvalued variables are often grouped into class intervals. Ex. Age in year- 5-9, 10-14, 15-19 and 20& above. Height in cm- 100-149, 150-199, 200-249 Numerical-valued variables are classified as: 1. Discrete – is a variable whose values are obtained by counting. Ex. Number of children, number of persons with blue eyes, number of patients with T.B., Number of males and females in a Statistics class. 2. Continuous – is a variable whose values are obtained by measuring such as temperature, distance, area, density, age, height. All of which cannot be put into a list because they can have any value in some interval of real numbers. Scales of Measurement In selecting the statistical tool to be used for drawing inferences on a random sample, the type of measurement scale must be carefully chosen. Measurements are classified into four. 1.Nominal scale - is a measurement scale that classified elements into two or more categories or classes, the numbers indicating that the elements are different but not according to order or magnitude. Ex. Table 1. Distribution of Medical Students of University of the Philippines Grouped According to Race And Civil Status Race Single Married Widow/er Separated Total American 10 5 0 1 16 Chinese 29 8 5 10 52 Japanese 18 11 1 3 33 Filipino 32 3 4 20 59 Total 89 27 10 34 160 The medicals are classified according to race and civil status. 2.Ordinal Scale - is a measurement scale that ranks individuals in terms of the degree to which they possess a characteristic of interest. Ex. Table2. Anxiety Level of Patients with Mental Disorder on Hospital Q. Sex 0 1 2 3 Total Male 9 16 2 1 28 Female 21 10 4 7 42 Toatal 30 26 6 8 70 Legend: 0 = not anxious 1 = low anxiety level 2 = moderate anxiety level 3 = high anxiety level 3. Interval Scale – Interval is a measurement scale, in addition to ordering scores from high to low. It also establishes a uniform unit in the scale so that any equal distance distance between two scores is of equal magnitude. Aptitude scores from 80 to 90 are of equal difference as aptitude scores from 90-100 ( both being equal to 10.) 4. Ratio Scale – Ratio is a measurement scale in addition to being an interval scale, that also has absolute zero in the scale. Summary: SCALE of Measurement Each number represents a category Greater than and less than relationships and and Units of measurement and Absolute Zero Application: A. Evaluate the data below write qualitative or quantitative. 1. 2. 3. 4. 5. 6. 7. 8. 9. 25 ft. Medium size 30% 6 meter 4 colonies Male Absent 100 seeds Blue eyes Nominal Ordinal Interval Ratio 10. 500 acre B. Write Primary or secondary data . 1. number of public vehicles in the city of Tacurong. 2. Enrollees of SKSU from 2010 to 2020 3. Information from diary 4. Data from the Daily Inquirer 5. TB patients of St. Louise Hospital from Jan. to June 2020. 6. information from police investigator 7. Information from the victim of accident. 8.Response from your respondent 9. data from the State of Nation Address of the Philippines. 10. data from the research journal. C. Write D if discrete and C if continuous 1. Number of foreigners migrating to the Philippines 2. Length of hair 3. Boiling point of water 1000C 4. John’s height is 160cm 5. number of children in Brgy Sebu with missing tooth. 6. Average speed of UB express along National High ways. 7. Number of online students present in zoom meeting. 8. number of leaves affected by leaf rot. 9. leaf width and length 10. Number of vaccinated Filipino. D.Write the advantages and disadvantages of each method in collecting data. Study 1.3 Sampling Techniques Overview: Analysis of data in research work requires that the number of population should be determined and specified if possible, so that the required sample size can easily be calculated based on sampling techniques and research designs. If the population is small, it is sometimes convienient to obtain the information by collecting the data for the whole of the population (total enumeration). However, if the population is large, more time and money can be saved by measuring only a sample drawn from the population. When the measurement is destructive, sampling is of course unavoidable for obvious reason. Objectives: At the end of the lesson, you should be able to: 1. compute the sample size; 2. enumerate the different sampling methods; 3. identify the use of different sampling methods in data collection. Content: Population – is the group of all study units about which a particular investigation may provide information. Population is denoted by “μ” Target population – is the whole group of study units to which we are interested in applying our conclusions. Study population - is the group of study unit to which we legitimately apply our conclusion. Sample – a subset or a representative part of the population; hence, the sample must possess the same characteristics of the population. Sample size is denoted by “n”. Sampling Types of Sampling: 1. Non- Probability or Judgment sampling Sample Sampling is based on a judgment selection of “typical” or representative elements of the population under study considering an arbitrarily set criteria. Population Inference 1.1. Purposive Sampling – a sample is drawn from the population where what constitute the representative elements or sample is already a preconceived idea. 1.2. Quota Sampling – sample is drawn for convenient and on the basis of a quota. 1.3. Sampling is done haphazardly 1.4. Sampling which involves volunteers 1.5. Convenience or Accidental sampling – Sampling where elements of the sample are those that are readily accessible to the sampler. 2. Probability Sampling – Sampling with a definite set of rules and procedures for drawing the sample is being followed. It allows one to evaluate the probability of each element to be part of the sample, even prior to drawing the actual sample. Probability samples are suitable to statistical analysis and scientific research. 2.1. Simple Random Sampling – sampling actually drawn from the the whole population, without replacement and with equal probability of selection for every possible sample. Methods of simple random sampling are: a. The box method b. Use of the table of random number c. Use of computer software package of random number generated. 2.2. Systematic Sampling – a method of sampling wherein a sample is drawn by taking say every K- the unit in the population starting from the ith unit drawn at random. This is used when there is ready list of the total population. Most practical way of sampling. 2.3. Stratified Sampling – a sampling procedure wherein the population is divided into non overlapping strata. These strata is homogeneous and a random sample is drawn independently from each stratum. This scheme is used to that different groups of a population are adequately represented in the sample. 2.4. Cluster Sampling – the total population is divided into a number of relatively small subdivision and some of these subdivisions or clusters are randomly selected for inclusion in the overall sample. 2.5. Multi-stage Sampling the technique uses several stages or phases in getting the sample from the general population. However selection of the sample is still done at random. It is useful in conducting nation - wide survey involving a large universe. Determination of Sample Size (n) Important criteria in determining the sample size (n). 1. Variability of the population(N) 2. Error will be tolerated / accepted. This is the desired precesion 3. Degree of confidence desired attached to the estimate of the parameter. That is, one needs to specify the confidence coefficient, (1-α x 100%) desired. 4. Resources available to obtain the data and the time diration to produce output. 5. Safety/risk of the enumerators. Sample size is advisable if the population is equal to or more than 100. But it is inapplicable to a population less than 100. Total population or census is advisable for population less than 100 for categorization purposes. To have a scientific determination of sample size, the formula below was suggested by Calmorin and Calmorin(1997 ) Ss= NV + { S2 + (1-p) } NSe + { V2 + p(1-p) } Where: Ss = Sample size N = Total number of population V = The standard value (2.58) of 1 percent level of probability with 0.99 reliability. Se = Sampling Error (0.01) P = The largest possible proportion (0.50) For instance, if the total population is 500, the standard value at 1% level of probability is 2.58 with 99% reliability with a sampling error of 1% or 0.01, and the proportion of a target population is 50% or 0.05; then the sample size is computed as follows: Given: N = 500 V = 2.58 Se =0.01 P = 0.50 Ss= NV + { S2 + (1-p) } NSe + { V2 + p(1-p) } Ss= 500(2.58) + { (0.01)2 + (1-.50) } 500(.01) + { 2.582 + .50(1-.50) } Ss= 1290+ { (0.0001) (0.50) } 5 + (6.6564) (.50) (.50) S= 193.57 or 194 The sample size of 500 is 194 which represents the subject of the study. Summary: In gathering statistical information for data analysis, the researcher: 1.must identify first the subject of the study. 2. delimit of determine the scope and coverage of the subject of the study. 3. determine their population and sampling size. 4. determine the sampling methods or techniques to be utilized. 5. prepare the necessary data gathering instruments for purposes of investigation. There are two types of samples: the probability sample and the nonprobability sample. Activity: Choose the best answer among the choices. 1.The best random sampling design because every individual in the population has equal chance of inclusion in the sample is a. Stratified random sampling b. Simple random sampling c. Restricted random sampling 2.The sampling design in which all individuals in the population are arranged in methodical manner and the nth name may be chosen in the construction of the sample is a. Systematic sampling b. Stratified random sampling c. Unrestricted random sampling 3.The sampling design based on selecting the individuals as samples according to the criteria of the researcher which serve as controls is a. Quota sampling b. Incidental sampling c. Purposive sampling d. Cluster sampling 4.The sampling design which is intended to improve the validity of the sample and is applicable when the population being studied is homogeneous is a. Cluster sampling b.Simple random sampling c. stratified sampling 5 A population of 900 has a sample size of a.218 b.217 c.219 d.220 6.Sampling is inapplicable to the population of a.100 b.110 c.99 7.Which of the following does not belong to the group? a. Quota b. Incidental sampling c. Cluster Sampling 8. The sample size of 750 population is a. 210 b. 211 c.208 9. A 2000 population has a sample size of a.236 b.238 c.232 10.Sampling design in which the population is grouped into small units such as blocks or districts is a.Purposive sampling b. Quota sampling c. Cluster sampling 11.Which of the following does not belong to the group? a. Purposive sampling b. Multi-stage sampling c. Cluster sampling 12.Sampling design in which the researcher simply takes the closest individuals as subjects of the study because they are most available is a. Quota sampling b. Purposive sampling c.Cluster sampling 13. A population of 300 has a samle size of a.181 b.166 c.165 14. The sampling design which is popular in the field of opinion research is a. Incidental sampling b. Cluster sampling c. Quota sampling 15. The sample size of 550 population is a.196 b.194 c. 192 II. Compute the sample size of the following population. 1.230 2. 340 3. 570 4.890 5. 2,300