UNIVERSITY OF PRETORIA Theme 1: Introduction to data Presenter: TM Malatji 1 Welcome • • • • Assistant Lecturer: Mr. Henry Schnetler 3 Lecture periods – 2 Online, 1 Physical Attend according to your group All queries to BES220info@gmail.com 2 Contents • • • • • • • General process of investigation Purpose of statistics Case study Observations Variable types Relationship between variables Study types 3 Theme 1: General process of investigation 1. Identify a question or problem. 2. Collect relevant data on the topic. 3. Analyze the data. 4. Form a conclusion. Statistics focusses on making steps 2, 3 and 4 objective, rigorous and efficient. 1. How best can we collect data? 2. How should we analyze the data? 3. What can we infer from the analysis? 4 Theme 1: Purpose of statistics 1. Data collection is done objectively Example: In 2007, Colgate was ordered by the Advertising Standards Authority (ASA) of the U.K. to abandon their claim: “More than 80% of Dentists recommend Colgate.” The slogan in question was positioned on an advertising billboard in the U.K. and was deemed to be in breach of U.K. advertising rules. What went wrong? 1. 2. A survey was sent out to dentists and hygienists and only those who responded were sampled – Not a random sample and not representative of all dentists. The survey allowed respondents to select more than one toothpaste – Not recommending the product. 5 Theme 1: Purpose of statistics 2. Data is analyzed rigorously Example: A researcher conducted a study in which ice cream sales for a 5 year period and forest fires in that same period were studied. The researcher observed that ice cream sales and forest fires are directly proportional (both increase/decrease concurrently) and concludes that ice cream consumption leads to more forest fires. What went wrong? 1. Ignoring important features – The study did not consider other confounding variables that could be related to both variables, e.g. the weather is related to both variables and could be the cause for the high ice cream sales and the increase in forest fires. 6 Theme 1: Purpose of statistics 3. Conclusions are formed correctly Example: A professor wants to know which lecture period is most preferred by students at the university. She samples 100 students from her class and asks them to select a lecture period that they prefer most and she ends up with a list of 3 most preferred lecture periods. She then presents to the HOD of her department that students generally prefer the 3 lecture periods that she found. What went wrong? 1. Generalizing the results to a larger group – Her students are not a representative sample of the population and she didn’t study any other groups, so she can’t draw conclusions about all students in the department. 7 Theme 1: Case study - Using stents to prevent strokes • Research question: – Does the use of stents reduce the risk of stroke? • Data on 451 at-risk patients collected • Each volunteer patient randomly assigned to treatment or control group. 8 Theme 1: Case study - Using stents to prevent strokes • A summary statistic is a single number that summarizes a large amount of data. • Proportion who had a stroke in the treatment group: 45/224 = 20% • Proportion who had a stroke in the control group: 28/227 = 12% 9 Theme 1: Case study - Using stents to prevent strokes • Difference of 8% with less strokes in control group. • Is this 8% significance or due to chance? – We will learn how to test the significance of our results. • Can we generalize these results to all stroke patients and all stents? – We will learn when we can make causal and correlation statements. 10 Theme 1: Observations • Data matrix contains the observation data • Case is a single row in the data matrix • Variables are the characteristics under study 11 Theme 1: Variable types 12 Theme 1: Variable types • Numerical variables: – Discrete • Whole, non-negative numbers • Example: The number of cars on the highway at a point in time – Continuous • Infinite range of number values • Example: The size of a plot of land in square meters 13 Theme 1: Variable types • Categorical variables: – Categorical • Non-numeric with no specific order • Example: Countries in the African continent – Ordinal • Non-numeric with a natural order • Example: Top 5 contestants in a singing competition • Example: Satisfaction levels of customers 14 Theme 1: Relationship between variables • Relationship: – Associated Hours of sleep Sleep hours required as a function of age 16 14 12 10 8 6 4 2 0 0 5 10 15 20 25 Age 15 Theme 1: Relationship between variables • Relationship: – Independent Income as a function of weight Monthly income (R1000) 80 70 60 50 40 30 20 10 0 0 20 40 60 Weight (kg) 80 100 120 16 Theme 1: Relationship between variables • Describing relationship: – Direction (Positive/Negative) 17 Theme 1: Relationship between variables • Describing relationship: – Form (Linear, non-linear or detail) 18 Theme 1: Relationship between variables • Describing relationship: – Strength (Strong, Moderate, Weak) 19 Theme 1: Relationship between variables • Describing relationship: – Direction (Positive/Negative) – Form (Linear, non-linear or detail) – Strength (Strong, Moderate, Weak) 20 Theme 1: Relationship between variables • Explanatory variable (Independent): – Suspected to cause effect in response variable. • Response variable (Dependent): – Suspected to respond to changes in explanatory variable 21 Theme 1: Study types • Observational study: – A study in which cases are observed or outcomes are measured without any intervention to affect the outcomes (e.g. No treatment given). • Experimental study: – A study in which an intervention is introduced and the effects are studied. 22 Theme 1: Study types • A study took a random sample of people and examined their social media habits. Each person was classified as either a light, moderate, or heavy social media user. The researchers looked at which groups tended to be happier. – Observational or Experimental study? • Observational 23 Theme 1: Study types • A study took a group of adults and randomly divided them into two groups. One group was told to drink tea every night for a week, while the other group was told not to drink tea that week. Researchers then compared when each group fell asleep. – Observational or Experimental study? • Experimental 24 Theme 1: Week 1 L1 Summary • • • • • • • • General process of investigation Purpose of statistics Case study Observations Variable types Relationship between variables Study types PS: Do complete the homework as exercise 25 Thank you! Happy studying 26