Statistical Methods: Math 153 By Wilhemina Adoma Pels Department of Statistics and Actuarial Science KNUST FIRST LECTURE March 4, 2022 1 / 42 Weekly Content Week 1: 1 2 3 4 5 6 7 8 Course introduction, provision of course outline and recommended textbooks Introduction to Statistics. Uses of Statistics. Basic terms in Statistics Variable and Data Measurement scales Stages of statistical investigation Data Collection (Primary and Secondary data) Week 2: 1 2 3 Questionnaire Design Quiz 1 Summarizing and describing data Week 3, 4 1 Using numerical summaries to characterize sample data Week 5, 6 1 Using graphical summaries to characterize sample data 2 / 42 Week 7: Midsem Exams Week 8 1 2 3 4 5 Introduction to Probability Axioms, Sets, Sample space, Measure of probability of events Mutually exclusive Independent events Conditional probability, Bayes’ theorem Week 9: Counting techniques: combination and permutations Week 10: Random variables and some discrete probability distribution Week 11: Some Continuous Probability Distributions Week 12: Revision 3 / 42 Week 1 4 / 42 What is Statistics? Statistics is the science concerned with developing and studying methods for collecting, organizing, analyzing, interpreting and presenting empirical data. Statistics is the science of learning from data. Types of Statistics 1 Descriptive statistics Summarizing and describing the data Uses numerical and graphical summaries to characterize sample data 2 Inferential statistics Uses sample data to make conclusions about a broader range of individuals–a population–than just those who are observed (a sample) 5 / 42 Types of Inferential Statistics Inductive: Generalization for the population based on knowledge of the sample. Deductive: Generalization for the sample based on knowledge of the population. 6 / 42 USE OF STATISTICS 1 To Present Facts in Definite Form 2 Comparisons 3 Policy Making 4 Forecasting 5 It Enlarges Knowledge 7 / 42 BASIC TERMS Population and Sample A population is the collection of all possible individual units whose characteristics are to be studied A sample is a subset of the population that is studied in order to make inference about the population 8 / 42 9 / 42 What is a variable? A variable is any attribute, characteristic, or measurable property that can vary from one observation to another. Example: height, hair color, Gender, Age of Patient, Weight TYPES OF VARIABLES 1 Qualitative or Categorical Variables Take on values that are names or labels Allow for classification of individuals based on some attribute or characteristic. Examples : Religion, Regions in Ghana, Gender 2 Quantitative Variable Numeric Represent a measurable quantity of individual. Examples: Volume, height, prices, Number of students in the class 10 / 42 Types of Quantitative Variables Discrete Variable : Is a quantitative variable that has either a finite number of possible values or a countable number of possible values. The term countable means that the values result from counting, such as 0, 1, 2, 3, and so on. Examples: number of children in a family, number of heads or tails, income Continuous Variable one that can take on any value within some range or interval (i.e., within a specified lower and upper limit). Examples: height, temperature, Weight Figure : Illustration of the relationship among qualitative, quantitative, discrete, and continuous variables. 10 / 42 DATA VRS VARIABLE The list of observed values for a variable is data. Example, gender is a variable; the observations male or female are data. Qualitative data are observations corresponding to a qualitative variable. Quantitative data are observations corresponding to a quantitative variable. Discrete data are observations corresponding to a discrete variable Continuous data are observations corresponding to a continuous variable Univariate vs Bivariate data Univariate data when only one variable is involved in the study Bivariate data when two variables are involved Multivariate data when a study has more than two variables 11 / 42 MEASUREMENT If a thing exists, it exists in some amount; and if it exists in some amount, it can be measured E. L. Thorndike (1914) 12 / 42 MEASUREMENT What is measurement? Measurement is the application of mathematics to things or events. A system of measurement is a crucial component of research. Simple example: How tall is Jane? More complex example: How shy is Jane? 13 / 42 Scales of measurement Nominal Scale Data that represent categories or names or labels. There is no implied order to the categories of nominal data. Observations are classified into mutually exclusive categories Examples: Eye color (blue, brown, green), Biological sex (male or female) Political Affiliation (GUM, CPP, NPP, GFP, NDC) Marital Status( single, divorced, widowed) Sometimes numbers are used to designate category membership.Here, the numbers do not have numeric implications; they are simply convenient labels. Example: Country of Origin Ghana= 1 Cameroon= 2 Nigeria= 3 Other= 4 14 / 42 Scales of Measurement Ordinal Scale: This scale has a logical ordering of the categories. designates an ordering (greater than, less than). It does not assume that the intervals between numbers are equal. 1 For example, finishing place in a race (first place, second place) 2 A psychiatrist may, for example, grade patients on an anxiety scale as ’not anxious’, ’mildly anxious’, ’moderately anxious’, or ’severely anxious’ and use the numbers 0, 1, 2, and 3 to label the categories, with lower numbers indicating less anxiety. 3 Adverse effect [no AE = 0, mild AE = 1, severe AE = 2] 15 / 42 16 / 42 MEASUREMENT Scales of Measurement Interval Scale: An important point to make about interval scales is that the zero point is simply another point on the scale; it does not represent the starting point of the scale or the total absence of the characteristic being measured. Designates an equal-interval ordering. For example, Temperature in Fahrenheit or Celsius is an interval scale measurement. The difference in temperature between 20 degrees F and 25 degrees F is the same as the difference between 76 degrees F and 81 degrees F. Likert scale is another example of interval scale measurement. 17 / 42 Example: Temperature 18 / 42 MEASUREMENT Scales of Measurement Ratio Scale: designates an equal-interval ordering with a true zero point (i.e. the zero implies an absence of the thing being measured). Examples Temperature in Kelvin (zero is the absence of heat or can’t get colder) and measurements of heights of students in this class (zero means complete lack of height). 19 / 42 MEASUREMENT Summary of Measurement Scales Measurement scales differ by order, equal intervals between adjacent units and absolute zero point. Nominal: None Ordinal: Order Interval: Order + Equal intervals Ratio: Order + Equal intervals + True zero Nominal or ordinal scaled data Use Bar Charts (simple, multiple, compound, etc ) or Pie Charts Interval or ratio scaled data Use Histogram, polygon, ogive, etc Scatter plot to assess association between quantitative variables. Note: No inference drawn at this point. The object being to convey information 20 / 42 Summary 21 / 42 STAGES OF STATISTICAL INVESTIGATIONS If the investigation is to optimize the use of the available resources, expertise and time, it is essential to carefully examine all aspects of the design and application of statistical investigations (experiments and surveys) at the planning level. STEPS 1. Statement of problem and objectives: We must identify the cause for concern and state explicitly what the problem is, characteristics to be measured, collection, processing and publishing methods 2. Target population and the use of sample or entire population: Define in clear unambiguous terms the population of interest, define the sample units to make them distinct, non-overlapping and recognizable and select an appropriate sampling design 22 / 42 STAGES OF STATISTICAL INVESTIGATIONS 3. Design of Questionnaire or Schedule: Construction of questionnaire or schedule is extremely important since the respondent and data collector must interpret them 4. Method of data collection: You have to decide whether data will be collected by personal interview, online, physical observation or some other method. Cost is a major factor here Personnel must be thoroughly trained to correctly locate sampling units and take measurements. 5. Required data: The data to be collected should be guided by the objective of the investigation. 23 / 42 STAGES OF STATISTICAL INVESTIGATIONS 6. List of available resources: A wide variety of resources is likely to be required for the operation of the investigation and the analysis of the results. These include the following: Physical resources: Sampling frame, maps etc Human resources: Data collectors, data analysts Financial resources 7. Conducting a pilot Survey: This must be carried out before the main survey. 8. Collection, Editing, Storage and organization of data 9. Interpretation and Presentation of Results 24 / 42 DATA COLLECTION METHODS Pros and Cons of Primary and Secondary Data Where do data come from? We have often seen our data all nice and collated in a database form: Results of product and process improvement experiments Firms/Institutions (demographic data, student enrollment, productivity data, etc) Take a step back if were starting from scratch, how do we collect or find data? Secondary data Primary data 25 / 42 DATA COLLECTION METHODS Secondary Data Secondary data is data someone else has collected EXAMPLES OF SOURCES Vital Statistics birth, death certificates Hospital, clinic, school nurse records Private and foundation databases City and regional governments Surveillance data from state government programs Federal agency statistics - Census, NHIS, etc 26 / 42 DATA COLLECTION METHODS Secondary Data - LIMITATIONS Finding secondary data could sometimes be frustrating 27 / 42 DATA COLLECTION METHODS Secondary Data - LIMITATIONS When was it collected? For how long? Maybe out of date for what you want to analyze. May not have been collected long enough for detecting trends. Is the data set complete? There may be missing information on some observations. Unless such missing information is seen and corrected for, analysis will be biased. Are there confounding problems? Sample selection bias? Source choice bias? In time series, did some observations drop out over time? Are the data consistent/reliable? Did variables drop out over time? Did variables change in definition over time? For example, number of years of education versus highest degree obtained 28 / 42 DATA COLLECTION METHODS Secondary Data - LIMITATIONS Is the information exactly what you need? In some cases, may have to use proxy variables. Variables that may approximate something you really wanted to measure. Are they reliable? Is there correlation to what you actually want to measure? USES OF SECONDARY DATA As an alternative to a survey As a source of supplementary information As a check on possible survey biases As a means of improving survey estimates 29 / 42 DATA COLLECTION METHODS Secondary Data - ADVANTAGES No need to reinvent the wheel. If someone has already found the data, take advantage of it. 30 / 42 DATA COLLECTION METHODS Secondary Data - ADVANTAGES It will save you money. Even if you have to pay for access, often it is cheaper in terms of money than collecting your own data. It will save you time. Primary data collection is very time consuming. It may be very accurate. When especially a government agency has collected the data, incredible amounts of time and money went into it. Its probably highly accurate. It has great exploratory value. Exploring research questions and formulating hypothesis to test. 31 / 42 DATA COLLECTION METHODS PRIMARY DATA Primary data is data you collect. 32 / 42 DATA COLLECTION METHODS PRIMARY DATA - EXAMPLES Surveys Focus groups Questionnaires Personal interviews Experiments and observational study 33 / 42 DATA COLLECTION METHODS PRIMARY DATA - LIMITATIONS Do you have the time and money for: Designing your collection instrument? Selecting your population or sample? Pretesting/piloting the instrument to work out sources of bias? Administration of the instrument? Entry/collation of data? Uniqueness. May not be able to compare to other populations Researcher error (Sample bias, Other confounding factors) DATA COLLECTION CHOICE What you must ask yourself: WILL THE DATA ANSWER MY RESEARCH QUESTION? 34 / 42 DATA COLLECTION METHODS DATA COLLECTION CHOICE To answer that, you much first decide what your research question is. Then you need to decide what data/variables are needed to scientifically answer the question If that data exists in secondary form, then use them to the extent you can, keeping in mind limitations But if it does not, and you are able to fund primary collection, then it is the method of choice. For example, Direct Observation/Experiments Telephone Postal or electronic mails Documents and reports Interviewing 35 / 42 DATA COLLECTION METHODS Questionnaire design A survey is only as good as the questions it asks What you should ask? 36 / 42 DATA COLLECTION METHODS Questionnaire design The questions asked are a function of previous decisions The questions asked are a function of future decisions (such as statistical analysis) Key Criteria Questionnaire relevancy: No unnecessary information is collected and only information needed to solve the problem is obtained. Be specific about your data needs; tie each question to an objective Questionnaire accuracy: Information is both reliable and valid Phrasing Questions Open ended response versus fixed alternative questions Decision criteria: type of research; time; method of delivery; budget; concerns regarding researcher bias 37 / 42 DATA COLLECTION METHODS AVOID Leading questions Overly complex questions Use of jargon Loaded questions (can use a counter-biasing statement) Ambiguity Double barreled questions Making assumptions DECISIONS Ranking, sorting, rating or choice How many categories or response positions Balanced or unbalanced Forced choice or non-forced choice 38 / 42 DATA COLLECTION METHODS Types of questions Types of fixed alternative questions Single dichotomy or dichotomous-alternative questions Example: Are you currently registered in a course at the University of Science and Technology? Yes No Respondent chooses one of two alternatives (yes/no; male/female) What scale would this data create? 39 / 42 DATA COLLECTION METHODS Multi-choice alternative questions Multi-choice alternative (Respondent chooses from several alternatives) 1.Determinant choice Choose only one from several possible responses Example: Which College are you currently registered in at the University? Engineering Science Arts/Soc. Science Health sciences Planning and Architecture 40 / 42 DATA COLLECTION METHODS Frequency determination Asks for an answer about frequency of occurrence Example: In a typical week, how often do you purchase chocolate chip cookies? Never Once Two or more times Multi-choice alternative questions 3.Check list Provide multiple answers to a single question Should be mutually exclusive and exhaustive Example: What brands of chocolate chip cookies have you, to the best of your memory, purchased in the past month (check all that apply?) Golden Tree Cadbury’s Presidents Choice Decadent 41 / 42 Thank You. 42 / 42