Unit 1: Statistics and Statistical Thinking • Statistics is the science of data • Statistics involves collecting, classifying, summarizing, organizing, analyzing and interpreting numerical information • Statistics is used in several different disciplines (scientific and non-scientific) to make decision and draw conclusions based on data. For instance: • In the pharmaceutical industry. It is impossible to test every drug for every person that may require it. So the industry needs a statistician. • In business, managers must often decide whom to offer their company’s products to such as a credit card company must asses how risky a potential customer is. • An individual who needs to lose weight for his upcoming new film. He needs to see data of successful diet. Average weight loss on Various diets across 8 weeks Weight Diet 1 2 3 4 5 6 7 8 Diet 1 310 310 304 300 290 285 280 284 Diet 2 310 312 308 304 300 295 290 289 Diet 3 310 307 306 303 301 299 297 295 Diet 4 310 308 305 303 297 294 290 287 Based on these numbers, which diet should he/she addopt? two types of statistics • Descriptive statistics: utilize numerical and graphical method to look for patterns in a data set, to summarize information revealed in a data set, and to present the information in a convenient form that individuals can use to make decisions. The main goal of descriptive statistics is to describe a data set. The class of descriptive statistics include both numerical measures (e.g. Mean, Median) or graphical displays of data (e.g. Charts or graphs) • Inferential statistics: utilize sample data to make estimates, decisions, predictions, or other generalizations about a larger set of data. descriptive statistics • Look at example of the table of various diets • What informations provided by the table? The most significance of diet process is Diet 1 Furthermore, Diet 1 is not stable (see week 7 & 8) Diet 4 shows a steady decline in weight loss One can make an educated decision suitable for his/her personal weight loss goals. inferential statistics • • • • • • • The main goal is to make a conslusion about a population based on a sample of a population. Inferential statistics mostly uses hypethesis testing. Key Definition: Experimental unit (an object upon which data is colletced) Population (a set of units that is of interest to study) Variable (a characteristic or property of an individual experimental unit) Sample (a subset of the units of a population) statistical hypothesis • An educated guess about the relationship between two (or more) variables. • Two main variables: Independent variable (the variable that represents the inputs to the dependent variable, or the variable that can be manipulated to see if they are the cause. Dependent variable (the variable which represents the effect that is being tested a case of statitical hypothesis • A literature teacher has a hypothesis that by demanding the students to read a novel in a week for 16 meetings, the students are able to be selfmotivated in reading habit rather than those who are accustomed to lecturing in every meeting. • Ind. Variable : reading a novel per week • Dep. Variable: self-motivation Since it is impossible to take all students as the sample, so the teacher is to take a sample to generalize the entire population. key steps of problem • • • • Descriptive Define the population (or sample) of interest Select the variables that are going to be investigated Select the tables, graphs, or numerical summary tool Identify patterns in the data Inferential • Define the population of interest • Select the variables that are going to be investigated • Select a sample of population units • Run the statistical tests on sample • Generalize the result to your population and draw conclusions types of data Qualitative Data • Measurement that cannot be measured on a natural numerical scale • Measurement can only be classified into one or more groups of categories • Example: brands of shoes (Nike, Adidas, or K-Swiss), gender (male or female) Quantitative Data • Measurement that can be recorded on a natrually occuring scale • Example: people’s salary in a year take-home assignment Discuss the difference between descriptive and inferential statistics 1. Give an example of research question that would use an inferential statistic solution 2. Identify the independent and dependent variable in the following research question: A production manager is interested in knowing if employees are effective if they work a shorter work week. To answer his question he proposes the following research question: Do more widgets get made if employees work 4 days a week or 5 days a week? 3. What is the difference between population and sample? 4. Write about a decision you made once in your life time using descriptive statistics!