University of Sulaimani Civil Eng. Dept Statistics. :2nd stage University of Sulaimani College of Engineering Civil Engineering Department Theory: 2 hrs Tutorial: -Practical: -Units: 2 Term: One course Statistics Mid term exaam: each 20 marks Quiz and homework, etc. : 20 marks Prepared by : Ass. Prof. Dr. Hirsh M. Majid ( ه رش محمد مجيد.)د Office: 3rd floor- Civil Engineering Mail: hirsh.majid@univsul.edu.iq Site: https://sites.google.com/a/univsul.edu.iq/hirsh-muhammad/ 1/25/2020 Prepared by : Dr. Hirsh M. Majid 1 Syllabus Chapter 1 : Introduction Chapter 2: Organizing data Chapter 3: Descriptive measures Chapter 4: Probability Chapter 5: Discrete random variables Chapter 6: Continuous random variables and their probability distributions Chapter 7: Sampling distributions Chapter 8: Estimation and sample size determination Chapter 9: Tests of Hypothesis Chapter 10: Chi-square procedures and Normal distribution 1/25/2020 Prepared by : Dr. Hirsh M. Majid 2 References • Schaum’s outlines: Beginning statistics, Larry J. Stephens • Statistics, 4th ed. by David Freedman and Robert Pisani • The Elements of Statistical Learning: Data Mining, Inference and Prediction, 2nd ed. by Trevor Hastie and Robert Tibshirani • Schaum’s outlines, Statistics, fourth edition, Murray R. Speigel, Ph.D. Larry J. Stephens, Ph.D. • Schaum’s outlines, Probability and Statistics, fourth edition, Murray R. Spiegel, PhD, John Schiller, R. Alu Srinivasan • Etc ….. 1/25/2020 Prepared by : Dr. Hirsh M. Majid 3 Statistics Chapter one: Introduction Statistics is the discipline that concerns the collection, organization, analysis, interpretation and presentation of data. OR Statistics are the sets of mathematical equations that we used to analyze the things. OR Statistics is the science of learning from data. - Statistics allows you to understand a subject much more deeply - It keeps us informed about, what is happening in the world around us. - It is used in a lot of application in a wide variety of disciplines; Engineering, Science, Economy, Medicine, social life, ….. 1/25/2020 Prepared by : Dr. Hirsh M. Majid 4 The field of statistics is divided into two major divisions: 1. Descriptive statistics, and 2. Inferential statistics Descriptive statistics: the use of graphs, charts, and tables and the calculation of various statistical measures to organize and summarize information is called descriptive statistics. There are a number of items that belong in this portion of statistics, such as: The average, or measure of the center of a data set, consisting of the mean, median, mode, or midrange The spread of a data set, which can be measured with the range or standard deviation Overall descriptions of data such as the five number summary Measurements such as skewness and kurtosis The exploration of relationships and correlation between paired data The presentation of statistical results in graphical form Inferential statistics: Inferential statistics start with a sample and then generalizes to a population. This information about a population is not stated as a number. Instead, scientists express these parameters as a range of potential numbers, along with a degree of confidence. 1/25/2020 Prepared by : Dr. Hirsh M. Majid 5 Variable, Observation, and Data set: A characteristic of interest concerning the individual elements of a population or a sample is called a variable. A variable is often represented by a letter such as x, y , or z. The value of a variable for one particular element from the sample or population is called an observation. A data set consists of the observations of a variable for the elements of a sample. Six hundred registered voters are polled and each one is asked if they approve or disapprove of the president’s economic policies. The variable is the registered voter’s opinion of the president’s economic policies. The data set consists of 600 observations. Variable: registered voter’s opinion Observation: each person among the 600 voters Data set: 600 observations A survey of 2500 households headed by a single parent is conducted and one characteristic of interest is the yearly household income. Variable: household income Observation: each household among 2500 household Data set: 2500 household 1/25/2020 Prepared by : Dr. Hirsh M. Majid 6 Quantitative variable: Discrete and continuous variable A quantitative variable is determined when the description of the characteristic of interest results in a numerical value. A discrete variable : is a quantitative variable whose values are countable, usually result from counting. A continuous variable: is a quantitative variable that can assume any numerical value over an interval or over several intervals, usually results from making a measurement of some type. 1/25/2020 Prepared by : Dr. Hirsh M. Majid 7 Qualitative variable: A qualitative variable is determined when the description of the characteristic of interest results in a non-numerical value. A qualitative variable may be classified into two or more categories. Qualitative variable Possible categories for the variable Marital status Single, married, divorced, separated Gender Male, female Crime classification Misdemeanor, felony Pain level None, low, moderate, severe Personality type Type A, Type B The possible categorized for qualitative variables are often coded for the purpose of performing computerized statistical analysis. Marital status might be coded as 1,2,3, or 4, where 1 represents single, 2 represents married, 3 represents divorced, and 4 represents separated. 1/25/2020 Prepared by : Dr. Hirsh M. Majid 8 Nominal, Ordinal, Interval, and Ratio levels of measurement 1/25/2020 Prepared by : Dr. Hirsh M. Majid 9 Nominal, Ordinal, Interval, and Ratio levels of measurement There are four levels of measurement or scales of measurements into which data can be classified. The nominal scale applies to data that are used for category identification. The nominal level of measurement is characterized by data that consist of names, labels, or categories only. Nominal scale data can not be arranged in an ordering scheme. The arithmetic operations of addition, subtraction, multiplication, and division are not performed for nominal data. Qualitative variable: 1-Blood type. 2-Color of road signs in the Sulaimani city. 3-Religion 1/25/2020 Prepared by : Dr. Hirsh M. Majid 10 Nominal, Ordinal, Interval, and Ratio levels of measurement The ordinal scale applies to data that can be arranged in some order, but differences between data values either cannot be determined or are meaningless. The ordinal level of measurement is characterized by data that applies to categories that can be ranked. Ordinal scale data can be arranged in an ordering scheme. Qualitative variable: Product rating : Poor, good, excellent Socioeconomic class: Lower, middle and upper Pain level: None, low, moderate, severe 1/25/2020 Prepared by : Dr. Hirsh M. Majid 11 Nominal, Ordinal, Interval, and Ratio levels of measurement The interval scale applies to data that can be arranged in some order and for which differences in data values are meaningful. The interval level of measurement results from counting or measuring. Interval scale data can be arranged in an ordering scheme and differences can be calculated and interpreted. For example: Temperatures represent interval level dada. The high temperature on February equaled 25F and the high temperature on March equaled 50F. It was warmer on March than it was on February. That is, temperatures can be arranged in order. It was 25F warmer on March than on February. That is, differences may be calculated and interpreted. 1/25/2020 Prepared by : Dr. Hirsh M. Majid 12 Nominal, Ordinal, Interval, and Ratio levels of measurement The ratio scale applies to data that can be ranked and for which all arithmetic operations including division can be performed. Division by zero is, of course, excluded. The ratio level of measurement results from counting or measuring. Ratio scale data can be arranged in an ordering scheme and differences and ratios can be calculated and interpreted. For example: The grams of fat consumed per day for adults is ratio scale data. Mashxal consumes 10 grams of fat per day and Zanyar consumes 20 grams per day. Zanyar consumes twice as much fat as Mashxal per day, since 20/10=2. - Good examples of ratio variables include height, weight, and duration. 1/25/2020 Prepared by : Dr. Hirsh M. Majid 13 1/25/2020 Prepared by : Dr. Hirsh M. Majid 14 Summation notation Example: Suppose the number of 112 emergency calls received on four days were 411,375, 400, and 478. If we let x represent the number of calls received per day, then the values of the variable for the four days are represented as follows: X1 = 411, X2 = 375, X3 = 400, and X4 = 478 The sum of calls for the four days is X1+X2+X3+X4 which equals 411+375+400+478 = 1664 The symbol , read as ‘the summation of x’ is used to represent X1+X2+X3+X4 1/25/2020 Prepared by : Dr. Hirsh M. Majid 15 Example: The following five values were observed for the variable x: x1 = 4, x2=5, x3=0, x4=6, and x5=10. The following computations illustrate the usage of the summation notation. = = = = = 1/25/2020 Prepared by : Dr. Hirsh M. Majid 16 Example: The following values were observed for the variables x and y: x1 = 1, x2 = 2, x3 = 0, x4=4. y1 = 2, y2 = 1, y3 = 4 y4 = 5. The following computations show how the summation notation is used for two variables. = = = ∑ 1/25/2020 ∑ = Prepared by : Dr. Hirsh M. Majid 17