Peter Michael Saya II Dr. Chauhan Population vs a random sample . Population mean value (μ) sample X mean value A sample mean is expected to be close to Population mean. If one of the values is unknown, use another one to predict the unknown Probability: use a known population mean value to predict a sample mean Since a sample mean varies from sample to sample we need to study it’s distribution: Statement of a Useful Theorem Consider a population of data with a mean value (μ) of 100 and a standard deviation (σ) of 10. If many random samples, each of size n=9, are drawn, and from each sample a mean, (xbar) is calculated, we can conclude that 1. The average value of all such xbars= 2. The standatr Dafa 3. The graph of x bar:NORMAL UNDER TWO CONDITIONS: 4. The result is useful in the sense that if we know how x bar behaves, we could use this in probability as well as in stat The distribution of sample mean • Many samples each of size n: Three conclusions: • 1) The mean value of all such sample means=µ • 2) The standard deviation of all such sample means= • • 3) The graph of all such sample means is NORMAL if • • • • • • n The population is normal or the sample size,n, is large How large is too large depends on the shape of the the population data but we are just told to use a rule of 30 or above. • This result is many applications, and I have explored on two of them and will talk about one. • Basically, if a value is normally distributed we can calculate the probability of its being say bigger than a specified value by using a normal table. My research involves ….. In my project, three different types of non normal populations were generated Uniform Exponential T For each population, I investigated if we really need a sample of 30 to assume that the graph of X is normal Population Distributions used: In my project, three different types of populations were used Each population was composed of random data generated by Minitab. The three population types were: Uniform Exponential t As stated earlier… There are a couple requirements in order to use specific formulas and get valid results: The sample size must be large and The population must be normally distributed The uniform, exponential, and t distributions are NOT normally distributed so therefore in order to use specific formulas, the sample size must be at least 30! OR DOES IT? Probability Theory Minitab generated 160,000 values for each of the 3 types of distributions In each population, samples of different sizes were created The mean of each sample (xbar) was then calculated by Minitab Selecting a random xbar value and using the above equation, a z-score was calculated and the associated probability of that z-score was determined using the normal table Probability Theory That theoretical probability was then compared to the actual probability Actual probability was found by numerically sorting the xbars in Minitab and counting the values above a predetermined number That frequency was divided by 4,000 (the no. of samples drawn) This is the actual probability Probability Theory Results: Uniform Distribution σ=2.88133 μ=5.00 n actual P( X>5.5) calc P(x>5.5) % difference 5 0.34800 0.3520 1.15% 10 0.29225 0.2946 0.80% 15 0.24950 0.2514 0.76% 20 0.22300 0.2206 -1.08% 25 0.19575 0.1949 -0.43% 30 0.17175 0.1736 1.08% 35 0.15525 0.1539 -0.87% 40 0.13175 0.1379 4.67% Sample size does not matter for uniform distributions! Probability Theory: Exponential Distribution Minitab can create different types of exponential distributions The shape of the exponential distribution depends on a value that Minitab calls “scale” The scale is the mean of the entire distribution I used scales of 5, 10, and 15 to see if changing the scale affects whether or not more or less than 30 is needed as a sample size for valid results Probability Theory Results: Exponential Distribution scale= 5 σ=5.01505 μ=5.00 n actual P(x>5.5) calc P(x>5.5) % difference 5 0.35175 0.41290 17.38% 10 0.34075 0.37830 11.02% 15 0.32625 0.35200 7.89% 20 0.30900 0.33000 6.80% 25 0.30000 0.30850 2.83% 30 0.27850 0.29460 5.78% 35 0.27250 0.27760 1.87% 40 0.26500 0.26430 -0.26% Sample size affects the probability. Sample size of 25 gave permissible results (permissible results are % difference < ~5%) Probability Theory Results: Exponential Distribution scale= 10 σ=9.741 μ=10.0 n actual P(x>11) calc P(x>11) % difference 5 0.35475 0.40900 15.29% 10 0.33950 0.37070 9.19% 15 0.32450 0.34460 6.19% 20 0.30700 0.32280 5.15% 25 0.29775 0.30150 1.26% 30 0.27950 0.28430 1.72% 35 0.26600 0.27090 1.84% 40 0.25450 0.25460 0.04% Sample size affected the probability. A size of 25 or higher gave permissible results. Probability Theory Results: Exponential Distribution scale= 15 σ=14.6591 μ=15.00 n actual P(x>16) calc P(x>16) % difference 5 0.39525 0.44040 11.42% 10 0.38325 0.41680 8.75% 15 0.38175 0.39740 4.10% 20 0.36475 0.38210 4.76% 25 0.34775 0.36690 5.51% 30 0.34550 0.35570 2.95% 35 0.32800 0.34460 5.06% 40 0.32225 0.33360 3.52% Sample size affected the probability. A sample size of 30 or greater resulted in permissible results. Probability Theory: t-distribution Minitab can create different types of t-distributions The shape of the t-distribution depends on a value that Minitab calls degrees of freedom The higher the degrees of freedom, the more normal the distribution looks The lower the degrees of freedom, the further the values deviate from μ, which equals zero for t-graphs I used degrees of freedom 3 and 10 to see if changing the degrees of freedom affects whether or not more or less than 25 is needed as a sample size for valid results Probability Theory Results: t-distribution D.o.F. = 3 σ=1.60332 μ=0 n actual P(x>0.1) calc P(x>0.1) % difference 5 0.42975 0.44830 4.32% 10 0.42025 0.42470 1.06% 15 0.402 0.40900 1.74% 20 0.39250 0.39360 0.28% 25 0.37850 0.38210 0.95% 30 0.36875 0.37070 0.53% 35 0.36250 0.35940 -0.86% 40 0.34675 0.35200 1.51% Sample size does not matter for t distributions! Probability Theory Results: t-distribution D.o.F.=10 σ=1.1066 μ=0 n actual P(x>0.1) calc P(x>0.1) % difference 5 0.41225 0.4207 2.05% 10 0.384 0.3859 0.49% 15 0.35 0.3632 3.77% 20 0.33650 0.3446 2.41% 25 0.31800 0.32640 2.64% 30 0.30525 0.31210 2.24% 35 0.28625 0.29810 4.14% 40 0.27475 0.28430 3.48% Sample size does not matter for t distributions! Two parts to the research: Probability theory 1. How does changing the sample size affect the probability that the mean of a sample lies above a certain value? Calculate a theoretical probability using z-scores and compare it to the actual probability from the data in Minitab Two parts to the research: Confidence Intervals of the Population Mean (μ) 1. How does changing the sample size affect the frequency at which the mean of the population is included in a 95% interval? 95% confidence interval means that 95% of the calculated intervals should include the population mean Using Minitab, a frequency count of the intervals that included μ could be determined Then dividing that frequency by the total number of intervals and multiplying by 100 gave a percentage This percentage was then compared to 95%