STA 013: Elementary Statistics Spring 2019 Lecture 1: April 3, 2019 Instructor: Maxime Pouokam These notes are partially based on those of Paromita Dubey. 1.1 Introduction Statistics is the science of data. This involves collecting, classifying, summarizing, organizing, analyzing, and interpreting information in order to help people make decisions when they faced with uncertainty. Sampling is one of the most basic concepts of statistics. In most statistical problems, a set of measurements or data- a sample is drawn from a much larger body of measurements-the population. • Descriptive statistics utilizes numerical and graphical methods to look for patterns in a data set, to summarize the information revealed in a data set, and to present that information in a user-friendly form. • Inferential statistics utilizes sample data to make estimates, decisions, predictions, or other generalizations about a larger set of data-the population. 1.2 The Population and The Sample • An experimental unit is an object about which we collect data. • A population is a set of units that we are interested in studying. Generally speaking, it is a large collection of individuals or objects that is the main focus of a scientific study. • A variable is a characteristic or property of an individual population unit. – A measurement results when a variable is actually measured on an experimental unit. – A set of measurements is called data. • How many variables have you measured in your study? – Univariate data: One variable is measured on a single experimental unit. – Bivariate data: Two variables are measured on a single experimental unit. – Multivariate data: More than two variables are measured on a single experimental unit. 1-1 1-2 Lecture 1: April 3, 2019 • A sample is a subset of the units of a population used to make inferences about a population. • Some Examples: – A medical researcher wants to estimate the survival time of a patient after the onset of a particular type of cancer and after a particular regimen of radiotherapy. ∗ Variable: ∗ Population: ∗ Experimental Unit: – An educational researcher wants to evaluate the effectiveness of a new method for teaching reading to deaf students. Achievement at the end of a period of teaching is measured by a students score on a reading test. ∗ Variable: ∗ Population: ∗ Experimental Unit: – Identify experimental units: ∗ Gender of a student ∗ Number of errors on a midterm exam ∗ Age of a cancer patient ∗ Number of flowers on an azalea plant ∗ Color of a car entering a parking lot 1.3 Types of Variables • Qualitative variables measure a quality or characteristic on each experimental unit. Qualitative variables produce data that can be categorized according to similarities or differences in kind; hence, they are often called categorical data. Lecture 1: April 3, 2019 1-3 • Quantitative variables measure a numerical quantity or amount on each experimental unit. These are numerical data which could be either discrete or continuous. A discrete variable can assume only a finite or countable number of values. A continuous variable can assume the infinitely many values corresponding to the points on a line interval. • Some Examples: – Identify each variable as quantitative or qualitative: 1. Amount of time it takes to assemble a simple puzzle 2. Number of students in a first-grade classroom 3. Rating of a newly elected politician (excellent, good, fair, poor) 4. State in which a person lives – Identify each quantitative variable as discrete or continuous. 1. Number of boating accidents along a 50-mile stretch of the Colorado River 2. Time required to complete a questionnaire 3. Cost of a head of lettuce 4. Number of brothers and sisters you have 5. Yield in kilograms of wheat from a 1-hectare plot in a wheat field – A data set consists of the ages at death for each of the 38 past presidents of the United States now deceased. 1. Is this set of measurements a population or a sample? 2. What is the variable being measured? 3. Is the variable in part b quantitative or qualitative? 1-4 Lecture 1: April 3, 2019 1.4 Graphical Methods 1.4.1 Qualitative/Categorical Variables Data distribution is represented graphically using the list of categories and how often they appear. • A class is one of the categories into which qualitative data can be classified. • The class frequency is the number of observations in the data set that fall into a particular class. • The class relative frequency is the class frequency divided by the total number of observations in the data set; that is, class frequency class relative frequency = n where n is the total number of observations. • The class percentage is the class relative frequency multiplied by 100; that is, class percentage = 100 × class relative frequency. • Bar Graph: The categories (classes) of the qualitative variable are represented by bars, where the height of each bar is either the class frequency, class relative frequency, or class percentage. • Pie Charts: The categories (classes) of the qualitative variable are represented by slices of a pie (circle). The size of each slice is proportional to the class relative frequency. To construct a pie chart, assign one sector of a circle to each category. The angle of each sector should be proportional to the proportion of measurements (or relative frequency) in that category. Since a circle contains 360 degrees, you can use this equation to find the angle: Angle = Relative frequency × 360. • Example: In a survey concerning public education, 800 school administrators were asked to rate the quality of education in the United States. Their responses are: A: 200 B: 400 C: 100 D: 100 Construct a pie chart and a bar chart for this set of data. Lecture 1: April 3, 2019 1-5 1-6 Lecture 1: April 3, 2019 1.4.2 Quantitative Variables • Dot Plot: The numerical value of each quantitative measurement in the data set is represented by a dot on a horizontal scale. When data values repeat, the dots are placed above one another vertically. Example: The prices (in dollars) of 19 different brands of walking shoes are: 90 70 70 70 75 70 65 68 60 74 70 95 75 70 68 65 40 65 70 Construct a dot plot to display the distribution of the data. • Stem-and-Leaf Display: The numerical value of the quantitative variable is partitioned into a stem and a leaf. The possible stems are listed in order in a column. The leaf for each quantitative measurement in the data set is placed in the corresponding stem row. Leaves for observations with the same stem value are listed in increasing order horizontally. Construct a stem and leaf plot to display the distribution of the data. • Histogram: The possible numerical values of the quantitative variable are partitioned into class intervals, each of which has the same width. These intervals from the scale of the horizontal axis. The frequency or relative frequency of observations in each class interval is determined. A vertical bar is placed over each class interval, with the height of the bar equal to either the class frequency or class relative frequency. Example: Birth Weights of 30 Full-Term Newborn Babies 7.2 7.8 6.8 6.2 8.2 8.0 8.2 5.6 8.6 7.1 8.2 7.7 7.5 7.2 7.7 5.8 6.8 6.8 8.5 7.5 6.1 7.9 9.4 9.0 7.8 8.5 9.0 7.7 6.7 7.7 Construct a relative frequency histogram of the data. Lecture 1: April 3, 2019 1-7