Basic Ideas of Statistics Unit 1.1 Basic Ideas of Statistics Corresponds to Chapter 1 in Triola Overview of Section 1.1 • • • • Measurement Robustness Populations and Samples Quantitative vs. Categorical Variables Measurement • Measurement is the fundamental activity that we do in statistics. A statistic is a measurement of a particular attribute that we want to know something about. Any time that we want to collect information about something and use statistics to draw conclusions, we MUST have some way to measure it. Statistics is a way of quantifying and drawing conclusions about the world that we live in. We transform the world into measurable quantities and then we use the practice of statistics to draw conclusions. In this section, we will examine the best ways to conduct measurement and how we can make our measurements as good as possible to make our conclusions as good as possible. Robustness • • • • • Fundamentally, there is no such thing as a fact in science. There are only conclusions and hypotheses that have yet to be disproven. What we are always attempting to do in science is to disprove what we think in order to give us new insights. However, we can’t operate from the position that we know nothing, so we must have some conclusions that are thought to be very well tested and that we treat as-if they were facts. We say that a conclusion that has been well tested and has withstood repeated attempts at disproof is robust. We also say that a finding is robust if it can be applied to a lot of different areas and under a lot of different circumstances. For example, the most robust finding in educational statistics is that socioeconomic status (SES), or the amount of income of a family or individual, factors significantly into educational achievement. This holds true regardless of whether you are looking at Math, English, Science or whatever. Ideally, what we want to create from the practice of science are robust conclusions. Conclusions that can be treated as facts. An example of how robust findings work. • For over 100 years, the conclusion that nothing can travel faster than light has been thought to be extremely robust. So robust that a giant portion of quantum mechanics and all of relativity has been founded on this single principle. However, a group of scientists at CERN, the giant particle accelerator in Switzerland, have found a group of neutrinos that have been measured to travel faster than the speed of light. If this is true, and a lot more tests will have to be conducted to prove the robustness of this claim, then much of physics might have to be seriously reconsidered. But the scientists at CERN were skeptical of their findings. Einstein’s findings of the absolute limit of light speed were so robust that any claim disproving it would be met with skepticism. So, when they presented their findings, they welcomed the opportunity for scrutiny. It turns out that experimental error, even with very carefully conducted checks, was the cause of the unusual findings and physics continues to believe that nothing moves faster than light. Samples and Populations • Population- The group of people or objects about which we want to know something – A parameter- a numerical value that describes a characteristic of the population • Sample- A subset of the population, generally of a given size, that we gather information about in order to draw conclusions about the population – Statistic- a numerical value that describes a characteristic of a sample Samples and Populations • A statistic are a description of a sample • A parameter is a description of a population • We rarely will have the parameter for a population. We always can gather statistics from a sample. Samples vs. Populations • In statistical reasoning, we are primarily drawing conclusions using statistics • We want these conclusions to tell us something about the population • We use statistics in order to help us draw conclusions about parameters and the characteristics of populations Categorical data • Categorical data consists of names or labels that are not numbers representing counts or measurements. – Eg. Colors, Gender, Race, Level of Parental Education • Discrete data- When the number of possible values in the sample space can be counted. – Eg. 14 women in the class, Number of children to a particular couple, number of students who passed a course. • Nominal level of measurement- The data cannot be ordered from highest to lowest. Quantitative Data • Quantitative data consists of numbers representing counts or measurements – Weight at the beginning of September, Weight at the end of August, Body Mass Index, velocity, dollars • Discrete Data- There can be an infinite number of data points, but each value is separate and countable • Continuous Data- There are infinite numbers of values that the data can take – Eg. 1, ½, 3.05, 3,005,141, 6.744 Quantitative Data • Ordinal level of measurement- We can arrange the data in some order, but differences between individual values are meaningless – Rankings of colleges, grades • Interval level of measurement- We can arrange the data in some order, and differences between values have meaning. However, there is no natural zero. – There is no real idea of nothing being present. – Eg. Years in which something occurred or body temperature, BMI, Weights in September Quantitative Data • The ratio level of measurement- The interval level except there is such a thing as having nothing present. There is a natural zero starting point. – Eg. Distances, prices Quantitative vs. Categorical Data • Categorical Data can only be broken up using the nominal level of measurement because we are often not able to order it meaningful ways. We do order categorical data, when the names for the data are orders. Like first place, second place, third place, etc. • Quantitative Data is generally not categorized using the nominal level of measurement, but can be ordered and so can use either the ordinal, the interval, or ratio level of measurement. – It is your understanding of what makes something ordinal or interval that determines if this set of data actually uses the kind of measurement. This is important! • We are going to separate everything in this class into these two categories: categorical and quantitative variables. • Proportions are the statistics that we derive from categorical variables. • Means, standard deviations, variances, and medians are the kinds of statistics that we derive from quantitative variables Review of Section 1.1 • • • • Measurement Robustness Populations and Samples Quantitative vs. Categorical Variables