University of Khartoum Faculty of Mathematical Science Department of Information Technology Applied Statistics )301 احصاء تطبيقي (احص Azza Osman Mohamed Course component محتوى Statistical Estimates. Test of Hypotheses . Correlation. Simple Linear Regression Analysis. Analysis of Variance. Non-parametric Test. Statistic package SPSS. المقرر .التقدير االحصائي .اختبارات الفروض .االرتباط الخطي .االنحدار الخطي البسيط .تحليل التباين .االختبارات الالمعلمية SPSS الحزمة االحصائية Course aim: The aim of this course is to develop further understanding of statistical methods. Outcome: By the end of this course you will be able to: o o o o o o Understand the inferential statistics. Describing common measures of correlation and association, and performing simple regression analysis. understand the workings of the analysis of variance table and its application to one-way ANOVA, and two-way ANOVA situations. understand the workings of the non-parametric methods. Perform statistical analysis using SPSS. Present and interpret the results. Course evaluation: o o o o Assignments. Labs . Mid-term exam. Final exam. Session 1 Learning Objectives At the end of session 1 and 2 you will be able to State Estimation Process Introduce Properties of Point Estimates Explain Confidence Interval Estimates Compute Confidence Interval Estimation for Population Mean ( known and unknown) Compute Confidence Interval Estimation for Population Proportion Introduction to Estimation Point Estimation Statistical Methods Statistical Methods Descriptive Statistics Inferential Statistics Estimation Hypothesis Testing Statistical Inference… Statistical inference is the process by which we acquire information and draw conclusions about populations from samples. Statistics Information Data Population Sample Inference Statistic Parameter In order to do inference, we require the skills and knowledge of descriptive statistics, probability distributions, and sampling distributions. Inference Process Estimates & Tests Population Sample Statistics X, Ps Sample Thinking Challenge Suppose you’re interested in the average amount of money that students in this class (the population) have on them. How would you find out? Estimation Methods Estimation Point Estimation Interval Estimation Estimation… The objective of estimation is to determine the approximate value of a population parameter on the basis of a sample statistic. An estimator is a method for producing a best guess about a population value. An estimate is a specific value provided by an estimator. Example: We said that the sample mean is a good estimate of the population mean o The sample mean is an estimator o A particular value of the sample mean is an estimate Point Estimator… Definition: A point estimator draws inferences about a population by estimating the value of an unknown parameter using a single value or point. Gives no information about how close value is to the unknown population parameter Example: the sample mean ( population mean ( ). ) is employed to estimate the Population Parameters Are Estimated with Point Estimator Estimate Population Parameter with Sample Statistic Mea n Proportion X p ps Variance 2 Differences 12 s X1 2 X2 Point Estimator… Question: Is there a unique estimator for a population parameter? For example, is there only one estimator for the population mean? The answer is that there may be many possible estimators Those estimators must be ranked in terms of some desirable properties that they should exhibit Properties of Point Estimators The choice of point estimator is based on the following criteria o Unbiasedness o Efficiency o Consistency Unbiased Estimators : عدم التحيز Definition A point estimator is said to be an unbiased estimator of the population parameter if its expected value (the mean of its sampling distribution) is equal to the population parameter it is trying to estimate ˆ E ˆ We can also define the bias of an estimator as follows Bias ˆ E ˆ Properties of Point Estimators To select the “best unbiased” estimator, we use the criterion of efficiency Efficiency: الكفاءة Definition An unbiased estimator is efficient if no other unbiased estimator of the particular population parameter has a lower sampling distribution variance. If ˆ1 and ˆ2 are two unbiased estimators of the population parameter , then ˆ1 is more efficient than ˆ2 if V ˆ1 V ˆ2 The unbiased estimator of a population parameter with the lowest variance out of all unbiased estimators is called the most efficient or minimum variance unbiased estimator (MVUE). Properties of Point Estimators Consistency : االتساق Definition: We say that an estimator is consistent if the probability of obtaining estimates close to the population parameter increases as the sample size increases One measure of the expected closeness of an estimator to the population parameter is its mean squared error The problem of selecting the most appropriate estimator for a population parameter is quite complicated References….. Inferences Based on a Single Sample: Estimation with Confidence Intervals John J. McGill/Lyn Noble Revisions by Peter Jurkat Chapter 10 Introduction on to Estimation Brocks/Cole , a division of Thomson learning, Inc. Basic Business Statistics: Concepts & Applications Chapter 8 Confidence Interval Estimation Chapter 1, Point Estimation Algorithms , Department of Computer science, University of Tennessee ,USA Session 2 Introduction to Estimation Interval Estimation Estimation Methods Estimation Point Estimation Interval Estimation Confidence Interval Estimation Process Population Mean, , is unknown Random Sample Mean X = 50 I am 95% confident that is between 40 & 60. Interval Estimator… An interval estimator draws inferences about a population by estimating the value of an unknown parameter using an interval. Confidence Interval Confidence Limit (Lower) Sample Statistic (Point Estimate) Confidence Limit (Upper) Provide us with a range of values that we belive, with a given level of confidence, containes a true value. That is we say (with some ___% certainty) that the population parameter of interest is between some lower and upper bounds. Gives Information about Closeness to Unknown Population Parameter Point & Interval Estimation… For example, suppose we want to estimate the mean summer income of a class of IT students. For n=25 students, is calculated to be 400 $/week. point estimate interval estimate An alternative statement is: The mean income is between 380 and 420 $/week. Confidence Interval (CI)..... فترة الثقة Probability that the unknown population parameter θ falls within interval ˆ ˆ , l u .θ تسمي فترة الثقة للمعلمةˆl ,ˆu probability that “true” parameter is in the interval ˆl ,ˆu to 1-. الفترة is equaled P(ˆL ˆU ) 1 1- is called confidence level. . θ على المعلمةˆ ,ˆ l u يسمى معامل الثقة وهو احتمال احتواء الفترة1- Limits of the interval are called lower and upper confidence limits. Confidence Interval (CI)..... فترة الثقة Actual realization of this interval ˆl ,ˆu is called a (1- )% 100 of confidence interval. . ( بأن المعلمة المجهولة تقع داخل الفترة1- )% 100 نكون واثقين بمقدار We are 95% confident that the 95% confidence interval will include the population parameter 5% is probability that parameter is Not within interval Typical values are 99%, 95%, 90%, … Interval and Level of Confidence Sampling Distribution of the Mean Z / 2 X Intervals extend from /2 X 1 X X Z X X 1 100% of intervals constructed contain ; 100% do not. to X Z X Z / 2 X /2 Confidence Intervals Know Central Intervals of the Normal Distribution X = ± Zx -2.58x -1.65 x -1.96x +2.58x +1.65x +1.96x 90% Confidence 95% Confidence 99% Confidence Factors Affecting Interval Width 1. Data Dispersion Measured 2. Sample Size X by X Intervals Extend from X - ZX toX + ZX = X / n 3. Level of Confidence (1 - ) Affects Z Confidence Interval Estimates Confidence Intervals Mean x Known Proportion x Unknown Variance Estimating μ when σ is known… Known, i.e. standard normal distribution Known, i.e. its assumed we know the population standard deviation… Known, i.e. sample mean Unknown, i.e. we want to estimate the population mean Known, i.e. the number of items sampled Confidence Interval Estimator for μ Usually represented with a “plus/minus” ( ± ) sign upper confidence limit (UCL) lower confidence limit (LCL) Four commonly used confidence levels… Confidence Level Example … A computer company samples demand during lead time over 25 time periods: 235 421 394 261 386 374 361 439 374 316 309 514 348 302 296 499 462 344 466 332 253 369 330 535 334 Its is known that the standard deviation of demand over lead time is 75 computers. We want to estimate the mean demand over lead time with 95% confidence in order to set inventory levels… Example … “We want to estimate the mean demand over lead time with 95% confidence in order to set inventory levels…” Thus, the parameter to be estimated is the pop’n mean μ . And so our confidence interval estimator will be: Example … In order to use our confidence interval estimator, we need the following pieces of data: 370.16 Calculated from the data… 1.96 75 n Given 25 therefore: The lower and upper confidence limits are 340.76 and 399.56. Thinking Challenge The mean of a random sample of n = 25 isX = 50. Set up a 95% confidence interval estimate for X if 2X = 100. X Z / 2 X Z / 2 n n 10 10 50 1.96 50 1.96 25 25 46.08 53.92 What is interval for sample size = 100? Confidence Interval Estimates Confidence Intervals Mean x Known Proportion x Unknown Variance Confidence Interval for Mean of a Normal Distribution with Unknown Variance If the sample size is large n ≤ 30 : في حالة حجم العينة كبير The population variance is not be known The sample standard deviation will be a sufficiently good estimator of the population standard deviation Z s n Thus, the confidence interval for the population mean is: s s X Z / 2 X Z / 2 n n Confidence Interval for Mean of a Normal Distribution with Unknown Variance If the sample size is small and the population variance is unknown, we cannot use the standard normal distribution If we replace the unknown with the sample st. deviation s the following quantity X t s/ n follows Student’s t distribution with (n – 1) degrees of freedom The t-distribution has mean 0 and (n – 1) degrees of freedom As degrees of freedom increase, the t-distribution approaches the standard normal distribution Student’s t Distribution Estimates the distribution of the sample mean, X , when the distribution to be sample is normal Standard Normal Bell-Shaped t (df = 13) Symmetric t (df = 5) ‘Fatter’ Tails 0 Z t Confidence Interval for Mean of a Normal Distribution with Unknown Variance a 100(1-)% confidence interval for the population mean when we draw small samples from a normal distribution with an unknown variance 2 is given by s X tn 1, / 2 n Student’s t Table /2 v t .10 t .05 t .025 1 3.078 6.314 12.706 Assume: n=3 df = n - 1 = 2 = .10 /2 =.05 2 1.886 2.920 4.303 /2 3 1.638 2.353 3.182 t values 0 2.920 t Estimation Example Mean ( Unknown) A random sample of n = 25 has X = 50 and s = 8. Set up a 95% confidence interval estimate for . S S X t / 2 X t / 2 n n 8 8 50 2.064 50 2.064 25 25 46.69 53.30 with 95% confidence Thinking Challenge For a sample where the sample size = 9, the sample mean = 28 and the sample s.d. = 3. What is the closest 95% confidence interval of the mean? Select A for [27, 29] B for [26.5, 29.5] C for [26, 30] D for [25.25, 30.75] E for [24.5, 31.5] Confidence Interval For the Population Proportion If we want to estimate the population proportion and n is large then: : اذا كان من المتوقع ان ال تكون نسبة النجاح غير معلومة وكان حجم العينة كبير فإن Z x pˆ n pˆ p pˆ 1 p n and Where x is the number of success . Confidence interval estimate pˆ z 2 ˆˆ pq p pˆ z 2 n ˆˆ pq n Example …. A random sample of 400 graduates showed 32 went to graduate school. Set up a 95% confidence interval estimate for p. ˆˆ ˆˆ pq pq pˆ Z / 2 p pˆ Z / 2 n n .08 .92 .08 .92 .08 1.96 p .08 1.96 400 400 .053 p .107 with 95% confidence Thinking Challenge You’re a production manager for a newspaper. You want to find the % defective. Of 200 newspapers, 35 had defects. What is the 90% confidence interval estimate of the population proportion defective? Solution …. pˆ qˆ pˆ qˆ pˆ z / 2 p pˆ z / 2 n n pˆ .175 (.825) .175 (.825) .175 1.645 p .175 1.645 200 200 .1308 p .2192 with 90% confidence References….. Inferences Based on a Single Sample: Estimation with Confidence Intervals John J. McGill/Lyn Noble Revisions by Peter Jurkat Chapter 10 Introduction on to Estimation Brocks/Cole , a division of Thomson learning, Inc. Basic Business Statistics: Concepts & Applications Chapter 8 Confidence Interval Estimation.