Chapter 1: Data and Statistics STK 110 …and I thought DATA and STATISTICS are the same … will have to Google it! 1 Mobile/QT clicker questions: will be asked throughout the lecture. Slides with a clicker question can be distinguished from lecture slide by the red bar with the turning point icon. Read study guide for rules regarding the mobile/QT clicker-practise When you see this icon you may discuss the question with immediate peer/s. Decide on an answer that is your “group” attempt. When you see this icon you may NOT discuss the question. 2 Terminology: Data and Data Sets Data is the facts and figures collected, analysed and summarized for presentation and interpretation. All the data collected in a particular study is called a data set 3 Chapter 1: Data and Statistics Study Table 2: Means of transport used by people attending an educational institution by type of institution (numbers), 2001 On foot Pre-school School College By By By mini- By bus By train bicycle motor bus/ taxi 619 005 3 727 3 020 56 587 18 844 3 169 9 897 862 70 007 73 092 2 023 28 937 605 161 1 663 DATA65 344 462 151 23 040 81 202 17 350 Technikon 51 425 1 218 1 152 44 401 25 739 13 387 University 70 794 3 402 2 732 42 510 19 517 8 804 Other 20 638 434 352 5 505 3 074 1 114 Source: Census 2001 DATA SET 4 Terminology: Elements, Variable, Observation Elements The entities on which data are collected. Variable A characteristic of interest for the elements. Observation The set of measurements obtained for a particular element. 5 Chapter 1: Data and Statistics Table 2: Means of transport used by people attending an educational institution by type of institution (numbers), 2001 VARIABLES ELEMENTS On foot Pre-school School College By By By mini- By bus By train bicycle motor bus/ taxi 619 005 3 727 3 020 56 587 18 844 3 169 9 897 862 70 007 73 092 2 023 28 937 1 663 605 161 65 344 462 151 23 040 81 202 17 350 Technikon 51 425 1 218 1 152 44 401 25 739 13 387 University 70 794 3 402 2 732 42 510 19 517 8 804 Other 20 638 434 352 5 505 3 074 1 114 Source: Census 2001 OBSERVATION 6 Data types and Measurement scales Variable define Data Variable Data Time spent watching TV during weekdays (hours) • • • • • 5 hours 3 hours 5 hours 0 hours 2 hours • • • • • Low Low Very low Very low High Numerical meaning Difference? Level of physical fitness (Very high, High, Low, Very low) Without numerical meaning Average time spent watching TV on weekdays? Average level of physical fitness? 7 Data types and Measurement scales Variable Data The weekday that you have the most leisure time • • • • • Tuesday Tuesday Tuesday Friday Thursday • • • • • 10 8 4 5 3 Without numerical meaning Difference? The number of push-ups done in one minutes Numerical meaning Average weekdays that students have the most leisure time? Average number of push-ups done in one minute? 8 Classification of data: Quantitative and Qualitative Data DATA Numerical meaning Without numerical meaning Can do calculations Cannot do calculations Quantitative data Qualitative/ Categorical data 9 Classification of data Variable define Data Variable Data Time spent watching TV during weekdays (hours) • • • • • 5,5 or 5 1 2 hours 3,25 or 3 1 4 hours 5 hours 0,5 or 1 2 hours 2,45 or 2 3 4 hours Decimals or fractions Difference? Number of days in a week with at least 30 min. of leisure time • • • • • 2 2 3 3 3 Integers or whole numbers 10 Classification of data: Scales of measurement DATA Quantitative data Fractions or decimals Integers or whole numbers (theoretical, between any two values, another value exists) (interval between values is expressed in terms of fixed values) Discrete Data Continuous Data Variables such as distance, height, weight, time Data type Variables such as number of push-ups, goals scored in a soccer game (ratio scale) (Interval scale) Measurement scale 11 Classification of data: Scales of measurement Variable Data • • • • • Preference of spending time at friends: (rate from 1 to 5 with 1 = not at all and 5 = very much) 5 5 3 3 4 Rank of data is meaningful Difference? Venues where you spend most of your leisure time: At home At shopping malls At friends • • • • • At friends At friends At friends At home At friends Data is classified into categories 12 Classification of data: Scales of measurement DATA Quantitative data Qualitative data Fractions or decimals Integers or whole numbers Classified into categories Rank is meaningful Continuous Data Discrete Data Nominal Data Ordinal Data (ratio scale) (Interval scale) Note: Ratio scale requires that a zero value exists. E.g. 0% for Math test = no marks. Note: Interval scale requires that a zero doesn’t mean zero. E.g. 0°C doesn’t mean nothing = it is very cold. Example: Soft Example: Rating drinks Excellent, Average Pepsi, Coke, Fanta, Poor Sprite Example: Sizes of Example: Sport an item Cricket, Rugby, Small ,Medium, Tennis, Large Swimming 13 Cross sectional Vs. Time series Cross-sectional data – Data collected at the same or approximately the same point in time. For example: On foot Pre-school School College By By By mini- By bus By train bicycle motor bus/ taxi 619 005 3 727 3 020 56 587 18 844 3 169 9 897 862 70 007 73 092 2 023 28 937 1 663 605 161 65 344 462 151 23 040 81 202 17 350 Technikon 51 425 1 218 1 152 44 401 25 739 13 387 University 70 794 3 402 2 732 42 510 19 517 8 804 Other 20 638 434 352 5 505 3 074 1 114 Source: Census 2001 14 Cross sectional Vs. Time series Data Time series data – Data collected over several time periods. South African Rand per 1 US Dollar Graph Dec 2014 – Dec 2015 15 Data and Statistics: Difference? Statistics makes sense of numbers More information Statistics More Organized More Understandable The baby weighs 2.5 kg 2.5 kg = data The average weight of a new born baby is 2.5kg = statistics 16 The way forward Statistics Descriptive Statistics Describe Express Explain Illustrate Portray Inferential Statistics Infer Speculate/ Deduce/ Reason Realize/ Gather/ Assume 17