Statistics Lecturer #1: Introduction What Do Engineers Do? ➢ An engineer is someone who solves problems of interest to society by the efficient application of scientific principles. Engineers accomplish this by either: • Refining an existing product or process • Designing a new product or process that meets customers’ needs ➢ The engineering or scientific method is the approach to formulating and solving these problems Engineering method Develop a clear description Identify the important factors Propose or refine a model Conduct experiments Manipulate the model Confirm the solution Conclusions and recommendations Statistics Supports The Creative Process ➢ The field of statistics deals with the collection, presentation, analysis, and use of data to: • Make decisions • Solve problems • Design products and processes ➢ It is the science of learning information from data. Data ➢ The measurements obtained in experiment/research study are called the data (sample). ➢ The goal of statistics is to help to organize and interpret the data. Types of Data Data Numerical Categorical Examples: Marital Status ▪ Are you registered to vote? ▪ Eye Color (Defined categories or groups) ▪ Discrete Continuous Examples: ▪ ▪ Number of Children Defects per hour (Counted items) Examples: ▪ ▪ Weight Voltage (Measured characteristics) Population ➢ The entire group of individuals is called the population. ➢ For example • The population of third-grade children. • Grade point averages of all the students in your university. • Incomes of all families living in Ho Chi Minh city. Sample ➢ Usually populations are so large that a researcher cannot examine the entire group. Therefore, a sample is selected to represent the population in a research study. The goal is to use the results obtained from the sample to help answer questions about the population. Random Sampling ➢ Simple random sampling is a procedure in which • Each member of the population is chosen strictly by chance, • Each member of the population is equally likely to be chosen, • Every possible sample of n objects is equally likely to be chosen ➢ The resulting sample is called a random sample Relationship between POPULATION and SAMPLE The role of stastistics Learning Outcomes ➢ After completing this course, students will be able to: • Use basic ideas and methods of descriptive statistics. • Handle the concept of probability theory and its mathematical implementation in the context of discrete and continuous stochastic models. • Apply some important estimation and test methods and interpret the results (inferential statistics). Materials ➢ Text Book: • Montgomery, Runger: Applied Statistics and Probability for Engineers, Wiley. ➢ References • Online Statistics: http://onlinestatbook.com/ • Virtual Laboratories in Probability and Statistics: http://www.math.uah.edu/stat • http://onlinestatbook.com/2/calculators/t_dist.html • https://web.ma.utexas.edu/users/davis/375/popecol/tables/f005.html • Anderson, Sweeney, Williams: Statistics for Business and Economics • Mario F. Triola, Elementary Statistics Contents 1. Introduction 2. Descriptive Statistics 3. Central Tendency, Variability and z-score. 4. Probability 5. Discrete Random Variables 6. Continuous Random Variables 7. Sampling 8. Estimation 9. Hypothesis Testing 10. Hypothesis Testing II 11. Analysis of Variance 12. Simple Regression Variability ➢ Statistical techniques are useful for describing and understanding variability. • By variability, we mean successive observations of a system or phenomenon do not produce exactly the same result. • We all encounter variability in our everyday lives, and statistical thinking can give us a useful way to incorporate this variability into our decision-making processes. • Statistics gives us a framework for describing this variability and for learning about potential sources of variability (represented by factors). Variability (Example) ➢ An engineer is designing a nylon connector to be used in an automotive engine application. The engineer is considering establishing the design specification on wall thickness at 3/32 inch, but is somewhat uncertain about the effect of this decision on the connector pull-off force. If the pull-off force is too low, the connector may fail when it is installed in an engine. Eight prototype units are produced and their pull-off forces measured (in pounds): 12.6, 12.9, 13.4, 12.3, 13.6, 13.5, 12.6, 13.1. Variability (Example) • The dot diagram is a very useful plot for displaying a small body of data say up to about 20 observations. • This plot allows us to see easily two features of the data; the location, or the middle, and the scatter or variability. Variability (Example) • The engineer considers an alternate design and eight prototypes are built and pull-off force measured. • The dot diagram can be used to compare two sets of data Is it possible that the thicker prototypes affect on the pull-off force? Variability (Example) ➢ There is variability in the pull-off force measurements ➢ We consider the pull-off force to be a random variable X=μ+ϵ where is a constant and is a random disturbance Statistical inference Figure 1-4 Statistical inference is one type of reasoning. Collecting Engineering Data ➢ In the engineering environment, the data are almost always a sample that has been selected from the population. Three basic methods of collecting data are: • A retrospective study using historical data ▪ Data collected in the past for other purposes. • An observational study ▪ Data, presently collected, by a passive observer. • A designed experiment ▪ Data collected in response to process input changes. Designed experiment ➢ Hypothesis Test: • A statement about some aspects of the system in which we are interested. • E.g: the mean strength exceeds 12.75 pounds ➢ One-sample hypothesis test: • Example: Ford avg mpg = 30 vs. avg mpg < 30 ➢ Two-sample hypothesis test: • Example: Ford avg mpg – Chevy avg mpg = 0 vs. ≠ 0. Factor experiment ➢ Use a small number of levels. ➢ A very reasonable experiment design strategy uses every possible combination of the factor levels to form a basic experiment. Factor experiment ➢ Consider a petroleum distillation column: • Output is acetone concentration • Inputs (factors) are: 1. Reboil temperature 2. Condensate temperature 3. Reflux rate • • • • Output changes as the inputs are changed by experimenter. Each factor is set at 2 reasonable levels (-1->low and +1->high) 8 (23) runs are made, at every combination of factors, to observe acetone output. Resultant data is used to create a mathematical model of the process representing cause and effect. Factor experiment Table 1-1 The Designed Experiment (Factorial Design) for the Distillation Column Factor experiment Three-factorial design An important advantage of factorial experiments is that they allow one to detect an interaction between factors. Factor experiment Factor Experiment Considerations • Factor experiments can get too large. For example, 8 factors will require 28 = 256 experimental runs of the distillation column. • Certain combinations of factor levels can be deleted from the experiments without degrading the resultant model. • The result is called a fractional factorial experiment. Factor Experiment Considerations A fractional factorial experiment is a variation of the basic factorial arrangement in which only a subset of the factor combinations is actually tested Observing Processes Over Time ➢ Often data are collected over time. In this case, it is usually very helpful to plot the data versus time in a time series plot. ➢ Phenomena that might affect the system or process often become more visible in a time-oriented plot and the concept of stability can be better judged. Observing Processes Over Time Use of Control Charts Understanding Mechanistic & Empirical Models ➢ A mechanistic model is built from our underlying knowledge of the basic physical mechanism that relates several variables. Example: Ohm’s Law Current = voltage/resistance I = E/R I = E/R + • The form of the function is known. Understanding Mechanistic & Empirical Models ➢ An empirical model is built from our engineering and scientific knowledge of the phenomenon, but is not directly developed from our theoretical or first-principles understanding of the underlying mechanism. ➢ The form of the function is not known a priori. Example of empirical model • In a semiconductor manufacturing plant, the finished semiconductor is wire-bonded to a frame. In an observational study, the variables recorded were: • Pull strength to break the bond (y) • Wire length (x1) • Die height (x2) • The data recorded are shown on the next slide. Example of empirical model Example of empirical model In general, this type of empirical model is called a regression model. The estimated regression relationship is given by: Example of empirical model Visualizing data Example of empirical model Visualizing the Resultant Model Using Regression Analysis Models Can Also Reflect Uncertainty • The process of reasoning from a sample of objects to conclusions for a population of objects was referred to as statistical inference. ➢ How can we quantify the risks of decisions based on samples? Furthermore, how should samples be selected to provide good decisions—ones with acceptable risks? ➢ Probability models help quantify the risks involved in statistical inference, that is, risks involved in decisions made every day. ➢ Probability provides the framework for the study and application of statistics.