STAT2054 Statistics for Engineers Week 1: Introduction to Staticstics 2 Objectives of Week 1 ▫ ▫ Learn the basic vocabulary of statistics Distinguish between sample and population, parameters and statistics ▫ ▫ ▫ ▫ ▫ Categorize types of data Classify data as discrete or continuous Classify variables as qualitative or quantitative Identify cases and variables in a research study Identify explanatory and response variables in a research study ▫ Distunguish between observational and experimantal studies ▫ ▫ Identify some sampling methods Understand vocabulary terms associated with statistical studies ▫ Identify experimental research design 3 Intro Statistics is the science of collecting, describing, and analyzing data. 4 Identify Cases and Variables ▫ The subjects/objects that we obtain information about are called the cases or units in a dataset. (Each row of the dataset corresponds to a different case.) ▫ A variable is any characteristic that is recorded for each case. Each column of our dataset corresponds to a different variable. ▫ The information gathered about a specific variable is collectively called data (the singular form of data is datum). 5 Data Types Qualitative (Categorical) Quantitavie (Numerical) Divides the cases into groups, placing each Measures or records a numerical quantity for case into exactly one of two or more categories each case ▫ Nominal: consisting of labels or names ▫ ▫ Ordinal: that can be arranged in a meaningful order, but where calculations don’t make sense. Continuous Data: that can take on any value in a given interval and are usually measurements. ▫ Discrete Data: that can take on only particular values and are usually counts. 6 Data Types Qualitative (Categorical) Quantitavie (Numerical) ▫ Nominal: Gender – Smoke – Award ▫ Continuous Data: Exercise – TV – GPA ▫ Ordinal: Birth ▫ Discrete Data: Pulse 7 Statistical Inference Statistical inference: The process of using data from a sample to gain information about the population. Example: A machine that makes steel rods for use in optical storage devices. The specification for the diameter of the rods is 0.45 ± 0.02 cm. During the last hour, the machine has made 1000 rods. The quality engineer wants to know approximately how many of these rods meet the specification. He does not have time to measure all 1000 rods. So he draws a random sample of 50 rods, measures them, and finds that 46 of them (92%) meet the diameter specification. 8 Samples & Populations ▫ Population: includes all individuals or ▫ objects of interest ▫ Parameter: measure concerning a population (e.g., population mean) Sample: representative subset of the population ▫ Staticstic: measure concerning a sample (e.g., sample mean) 9 Samples & Populations 10 Samples & Populations Example: In a survey, 257 residential college students at Bellevue University were asked if they had eaten lunch in the student center. 72% of the students surveyed said yes. After analyzing the results, university determines that approximately 70% of residential students have eaten lunch in the student center. ▫ ▫ ▫ ▫ Population: residential college students at Bellevue University Sample: 257 residential college students at Bellevue University Parameter: %70 Statistic: %72 11 Statistical Study Steps of conducting a statistical study; 1. 2. 3. 4. Design the study - State the question Determine the population Determine the variables Determine the type of study: observational or experimantal Collect the data Organize the data Analyze the data to answer the question 12 State the question ▫ ▫ describe and analyze a single variable describe and analyze relationships between two or more ▫ ▫ variables (relationships might be between two categorical variables, two quantitative variables, or a quantitative and a categorical variable) What percentage of students smoke? What is the average number of hours a week spent exercising? ▫ ▫ Do males or females watch more television? Do students who exercise more tend to have lower pulse rates? 13 Determine the population & variables A teacher wants to know if students who spend more time reading at home get higher homework and exam grades. ▫ ▫ Population: students Variables: amount of time spent reading at home, homework grades, and exam grades A researcher wants to know if dogs who are fed only canned food have different body mass indexes (BMI) than dogs who are fed only hard food. They collect BMI data from 50 dogs who eat only canned food and 50 dogs who eat only hard food. ▫ ▫ Population: dogs Variables: type of food and BMI 14 Type of Study Observational Study Experimental Study A study in which the researcher collects data A study in which the researcher manipulates the without performing any manipulations; treatments (i.e., level of the explanatory variable) received by subjects and collects data Example: A team of researchers want to know if Advil or Tylenol is more effective. ▫ Researchers survey a sample of adults and ask if they use Advil or Tylenol. They ask them to rate the effectiveness of the one they use. ▫ Observational Study Researchers obtain a random sample of adults. They randomly assign half of the participants to take Advil and the other half to take Tylenol. They ask each participant to rate the effectiveness of the one that they were assigned to take. Experimental Study 15 Sampling A sample should be selected from a population randomly (random sample), otherwise it may be prone to bias. Our goal is to obtain a sample that is representative of the population. Sampling bias occurs when the method of selecting a sample causes the sample to differ from the population in some relevant way. If sampling bias exists, then we cannot trust generalizations from the sample to the population. Convenience sample: Individuals who are easily accessible are more likely to be included in the sample. Example: A construction engineer has just received a shipment of 1000 concrete blocks, each weighing approximately 50 pounds. The blocks have been delivered in a large pile. The engineer wishes to investigate the crushing strength of the blocks by measuring the strengths in a sample of 10 blocks. To draw a simple random sample would require removing blocks from the center and bottom of the pile, which might be quite difficult. For this reason, the engineer might construct a sample simply by taking 10 blocks off the top of the pile. A sample like this is called a sample of convenience. 16 Sampling Do we need to eat an entire large pot of soup to know what the soup tastes like? ▫ When you taste a spoonful of soup and decide the spoonful you tasted isn't salty enough, that's exploratory analysis. ▫ ▫ If you generalize and conclude that your entire soup needs salt, that's an inference. For your inference to be valid, the spoonful you tasted (the sample) needs to be representative of the entire pot (the population). - If your spoonful comes only from the surface and the salt is collected at the bottom of - If you first stir the soup thoroughly before you taste, your spoonful will more likely be the pot, what you tasted is probably not representative of the whole pot. representative of the whole pot. 17 Sampling Methods There are several techniques we can use to collect sample data. ▫ ▫ Simple Random Sampling (SRS) Stratified Sampling ▫ ▫ Cluster Sampling Multi-stage Sampling 18 Sampling Methods ▫ Simple Random Sample (SRS) Each case in the population has an equal chance of being included and there is no implied connection between the cases in the sample. Example: Randomly select one of the department in engineering faculty. 19 Sampling Methods ▫ Stratified Sampling A divide-and-conquer sampling strategy. The population is divided into groups called strata. The strata are chosen so that similar cases are grouped together, then a second sampling method, usually simple random sampling, is employed within each stratum. Example: Randomly select student from engineering faculty. CSE ME MSE IE ENVE EE KMM BIOE 20 Sampling Methods ▫ Cluster Sampling the population is divided into many groups, called clusters. Then fixed number of clusters and include all observations from each of those clusters are included in the sample. 21 Sampling Methods ▫ Multi-stage Sampling It is like a cluster sampling, but rather than keeping all observations in each cluster, a random sample within each selected cluster is collected CLUSTER 1 CLUSTER 2 CLUSTER 3 CLUSTER 4 CLUSTER 5 CLUSTER 6 CLUSTER 7 CLUSTER 8 22 Experimental Study ▫ Experiments are the only way to show a cause-and-effect relationship Explanatory Variable Response Variable Also known as the independent or predictor Also known as the dependent or outcome variable, it explains variations in the response variable, its value is predicted or its variation is variable explained by the explanatory variable one variable is used to predict or explain differences in another variable 23 Explanatory & Response Variables Example: Researcher believes that the origin of the beans used to make a cup of coffee affects hyperactivity. He wants to compare coffee from three different regions: Africa, South America, and Mexico. ▫ ▫ explanatory variable : origin of coffee bean response variable : hyperactivity level Example: A group of middle school students wants to know if they can use height to predict age. They take a random sample of 50 people at their school, both students and teachers, and record each individual's height and age. ▫ ▫ explanatory variable : height response variable age : 24 Experimental Study ▫ Association: Two variables are associated if values of one variable tend to be related to the values of the other variable. ▫ Example: ▫ ▫ ▫ ▫ ▫ Causation: Two variables are causally associated if changing the value of one variable influences the value of the other variable. Studies show that taking a practice exam increases your score on an exam. causation Families with many cars tend to also own many television sets. association Sales are the same even with different levels of spending on advertising. no association Goldfish who live in large ponds are usually larger than goldfish who live in small ponds association Putting a goldfish into a larger pond will cause it to grow larger. causation 25 Experimental Study ▫ Confounding Variable (Lurking Variable or Confounding Factor): is a third variable that is associated with both the explanatory variable and the response variable. A confounding variable can offer a plausible explanation for an association between two variables of interest. Example: the number of vehicles (in millions) registered in the US and the average life expectancy (in years) of babies born in the US every four years from 1970 to 2014. confounding variable: year 26 Experimental Study ▫ ▫ ▫ Treatment group: a group of subjects to which researchers apply a treatment. Control group: a group of subjects to which no treatment or a placebo is applied. Placebo effect: is something that appears to the participants to be an active treatment, but does not actually contain the active treatment. ▫ Single-blind experiment: subjects do not know if they are in the control group or the treatment group, but the people interacting with the subjects in the experiment know in which group each subject has been placed. ▫ Double-blind experiment: neither the subjects nor the people interacting with the subjects know to which group each subject belongs. 27 Design of Experiment ▫ ▫ Randomization: Researchers randomize patients into treatment groups and control groups Controlling: Researchers assign treatments to cases, and they do their best to control any other differences in the groups (Control for outside effects) ▫ Replication: The more cases researchers observe, the more accurately they can estimate the effect of the explanatory variable on the response. (to see meaningful patterns) ▫ Blocking: Researchers sometimes know or suspect that variables, other than the treatment, influance the response. Under these circumstances, they may first group individuals based on this variable into blocks and then randomize cases within each block to the treatment groups 28 Design of Experiment Was the sample randomly selected? Possible to generalize from the sample to the population Cannot generalize from the sample to the population Was the explanatory variable randomly assigned? Possible to make conclusions about causality Cannot make conclusions about causality 29 Exercise A recent article claims that “Green Spaces Make Kids Smarter.” The study described in the article involved 2623 schoolchildren in Barcelona. The researchers measured the amount of greenery around the children’s schools, and then measured the children’s working memories and attention spans. The children who had more vegetation around their schools did better on the memory and attention tests. ▫ ▫ ▫ ▫ ▫ ▫ ▫ What are the population and sample in this study? Kids and the 2,623 schoolchildren in Barcelona What is the explanatory variable? Amount of greenery What is the response variable? the children’s test scores Does the headline imply causation? Yes Is the study an experiment or an observational study? Observational study Is it appropriate to conclude causation in this case? No Suggest a possible confounding variable. The socioeconomic status