STAT 200 Chapter 1 Stats Starts Here What is Statistics? Statistics is a science that involves the design of studies, data collection, summarizing and analyzing the data, interpreting the results and drawing conclusions. Conclusions are made about specific phenomena on the basis of relatively limited information. Data and Variables Questions of interest: How big and frequent are earthquakes in Canada? How are earthquakes that occur in British Columbia compared to the rest of Canada? Eugenia Yu, UBC Department of Statistics. Not to be copied, used, or revised without explicit written permission from the copyright owner. 1 STAT 200 Statistical process of investigation: 1. Collect data: What data to collect? How and where do we obtain data? 2. Examine the data: How do we present and obtain useful information from the data? We will learn the techniques in the next few chapters. 3. Interpret the results and draw conclusions Solution: We can examine earthquake data during the past year that are available on the Natural Resources Canada website at: http://www.earthquakescanada.nrcan.gc.ca/recent/maps-cartes/index-eng.php Eugenia Yu, UBC Department of Statistics. Not to be copied, used, or revised without explicit written permission from the copyright owner. 2 STAT 200 Let’s examine the data on earthquakes that occurred in year 2020. The data are displayed in reverse chronological order. Date Season Depth Magnitude Region 20201231 Winter 20 1.3 BC 20201231 Winter 20.11 1.9 NT 20201231 Winter 2.71 4 BC 20201231 Winter 19.86 1.8 BC 20201231 Winter 18 2.1 ON 20201230 Winter 14.14 2.1 BC 20201230 Winter 20.32 2.5 BC 20201230 Winter 25.55 2.4 BC 20201230 Winter 2 2.2 ON 20201230 Winter 18 2.1 QC 20201230 Winter 38.75 1.9 BC 20201229 Winter 1 0.8 BC 20201229 Winter 13.57 1.8 BC 20201229 Winter 5 2 NB : : : : : : : The full data set can be found on the lecture notes page on the course website. Eugenia Yu, UBC Department of Statistics. Not to be copied, used, or revised without explicit written permission from the copyright owner. 3 STAT 200 Eugenia Yu, UBC Department of Statistics. Not to be copied, used, or revised without explicit written permission from the copyright owner. 4 STAT 200 In a typical data set, each row contains information corresponding to an individual or an object or an experimental unit. A variable refers to a characteristic of interest, e.g. an earthquake. A variable can be: magnitude of 1. qualitative/categorical – categorical variables with categories that can be ordered are called ordinal variables. An example is marital status, with four categories: single, married, divorced, and widowed Another example is severity of pain - suppose that there are three categories: mild, moderate and severe. Since the categories can be ranked, it is an ordinal variable. Eugenia Yu, UBC Department of Statistics. Not to be copied, used, or revised without explicit written permission from the copyright owner. 5 STAT 200 2. quantitative (measured on a numerical scale) – units should be attached. Some examples of quantitative variables include height, weight, lifetime. Example: Earthquake data Variable Variable type Season Categorical Unit N/A Depth Quantitative kilometers Magnitude Quantitative dyne-cm (Richter scale) Region Categorical N/A Eugenia Yu, UBC Department of Statistics. Not to be copied, used, or revised without explicit written permission from the copyright owner. 6 STAT 200 Understanding the Data • Who - subjects we wish to study about • What - variables of interest • Where - location in which the study is conducted / data are collected • When - at what time point or over what time period are the data collected • Why - purpose of doing a study / collecting the data • How - method used to collect the data Eugenia Yu, UBC Department of Statistics. Not to be copied, used, or revised without explicit written permission from the copyright owner. 7