BG1201 - Chap1

advertisement
[BG1201] STATISTICS I
Chapter1
What is statistics?
Statistics is the science of collecting, organizing, presenting, analyzing and
interpreting data to assist in making more effective decisions.
Type of statistics
ο‚· Descriptive statistics Method of organizing, summarizing and presenting
data in an informative way
ο‚· Inferential statistics (Inductive statistics)the methods used to find out
something about a population based on a sample or a decision, estimate,
prediction or organization about a population based on a sample.
A population
A population is a collection of all possible individuals, objects or measurement of
interest.
A sample
A sample is a portion, or part of the population of interest.
1|Page
[BG1201] STATISTICS I
Type of variables (data)
1. Qualitative Variable
The characteristic of variable being studied is nonnumeric.
2. Quantitative Variable
The variable can be reported numerically. Example Balance in my account,
number of children.
Quantitative Variable can be classified as
-
Discrete Variablecan only assume certain values and there are usually “gaps”
between values.
-
Continuous Variablecan assume any value within a specific range.
Variable
Qualitative
Ciscrete
variable
Quantitative
continuous
variable
2|Page
[BG1201] STATISTICS I
Level of measurement
1.
Nominal
Data can be classified into categories but cannot be arranged in an ordering
scheme. Each value of data can be assigned a code in a form of a number where
numbers are simply labels. You can count but not order or measure nominal data. For
example, eyes color, gender.
2.
Ordinal
Involves data that can be arranged in some order or have a rating scale attached.
Can count and order. Bu not measure the data. The difference between data values
cannot be determined or are meaningless. For example ranking of students ( freshmen,
sophomore, junior, senior)
3.
Interval
Is the next higher level. It includes all characteristics of the ordinal level, but in
addition, the difference between values is constant. It is also important to note that 0 is
just a point on scale. It does not represent the absence of the condition. For example
temperature, GPA, score, …
4.
Ratio
Is the highest level. It has all characteristics of the interval level. But in addition,
the 0 is meaningful and the ratio between two numbers is meaningful. For example,
salary, age,…
3|Page
[BG1201] STATISTICS I
Chapter 2 Frequency Distribution and Graphic Presentation
Frequency Distribution
A grouping of data into categories showing the number of observation in each
mutually exclusive category.
The steps for organizing data into a frequency distribution
Step 1 Decide a number of classes, usually between 5 and 15
Step 2 Compute the class width
πΆπ‘™π‘Žπ‘ π‘  π‘€π‘–π‘‘π‘‘β„Ž =
π»π‘–π‘”β„Žπ‘’π‘ π‘‘ − π‘™π‘œπ‘€π‘’π‘ π‘‘ π‘£π‘Žπ‘™π‘’π‘’
π‘›π‘œ. π‘œπ‘“ π‘π‘™π‘Žπ‘ π‘ π‘’π‘ 
Step 3 Create non overlapping classes. The smallest value is the lower class limit
of the first class. Add the class width to find the lower class limit of the second class.
Count the number of items in each class.
Step 4Tally the data into the classes ancount the number in each class.
Suggestions on constructing Frequency Distribution
1. The class widths used in the frequency distribution should be equal
2. Too many classes or few classes might not reveal the basic shape of the set of
data.
3. Avoid overlapping stated class limits.
4. Try to avoid open-ended classes. They cause problem in graphing and in
determining measure of central tendency and dispersion, described in chapter 3 and 4
4|Page
[BG1201] STATISTICS I
Class limit
Lower limit = the lower end of the class.
Upper limit = the upper end of the class.
Midpoint
Also called class mark, is half way between the lower and the upper class limits.
π‘€π‘–π‘‘π‘π‘œπ‘–π‘›π‘‘ =
π‘™π‘œπ‘€π‘’π‘Ÿ π‘™π‘–π‘šπ‘–π‘‘ + π‘’π‘π‘π‘’π‘Ÿπ‘™π‘–π‘šπ‘–π‘‘
2
Actual class limit (class boundary)→ actual lower limit, actual upper limit
π΄π‘π‘‘π‘’π‘Žπ‘™ πΆπ‘™π‘Žπ‘ π‘  π‘™π‘–π‘šπ‘–π‘‘ =
π‘’π‘π‘π‘’π‘Ÿ π‘™π‘–π‘šπ‘–π‘‘ π‘œπ‘“ π‘™π‘œπ‘’π‘€π‘Ÿ π‘π‘™π‘Žπ‘ π‘  + π‘™π‘œπ‘€π‘’π‘Ÿ π‘™π‘–π‘šπ‘–π‘‘ π‘œπ‘“ β„Žπ‘–π‘”β„Žπ‘’π‘Ÿ π‘π‘™π‘Žπ‘ π‘ 
2
Cumulative Frequency
Less than = number of data items whose value are smaller than an upper
boundary of a class.
More than = number of data items whose value are larger than an lower boundary
of a class.
Relative frequency
Ratio of frequency of that class by the total frequency
Percentage frequency
Relative frequency×100= ____%
5|Page
[BG1201] STATISTICS I
Graphic presentation of a frequency distribution
There are three commonly used graphic forms:
Histogram
Frequency
A graph in which the classes are marked on the horizontal axis and the class
frequencies on the vertical axis. The class frequencies are represented by the height of
the bars and the bars are drawn adjacent to each other
14
12
10
8
6
4
2
0
79.5 99.5 119.5 139.5 159.5 179.5 199.5 219.5
Actual class limit
Frequency Polygon
Consists of line segments connecting the points formed by the class midpoint
and the class frequency.
14
12
Frequency
10
8
6
4
2
0
79.5 99.5 119.5 139.5 159.5 179.5 199.5 219.5
6|Page
[BG1201] STATISTICS I
Ogive (Cumulative frequency distribution)
Is used to determine how many data values are below or above a certain value
60
Frequency
50
40
less than
30
more than
20
10
0
79.5 99.5 119.5139.5159.5179.5199.5219.5
7|Page
[BG1201] STATISTICS I
Chapter 3 Measures of location
Measures of Central tendency
A single value that summarizes a set of data. It locates the value. You are familiar
with the concept of an average. For example, The average annual maintenance expense
$269 for a new car and $565 for a car more than one year old.
We will being by discussing the most widely used and widely reported measure
of central tendency, the arithmetic mean.
Population mean
Is the sum of all values in the population divided by the number of values in the
population. Any measurable characteristic of a population is called a parameter. The
mean of a population is a parameter.
πœ‡=
π‘₯1 + π‘₯2 + β‹― + π‘₯𝑛 ∑ π‘₯
=
𝑁
𝑁
Whereπœ‡ represents the population mean. It is the Greek lowercase letter “mu”
N in the number of items in the population.
X represents any particular value.
∑ is the Greek capital letter “sigma” and indicate the operation of adding.
8|Page
[BG1201] STATISTICS I
Sample mean
The mean of a sample and the mean of population are computed in the same
way, but the notation used is different. The mean of a sample, or any other measure
based on sample data, is called a statistic.
π‘₯Μ… =
π‘₯1 + π‘₯2 + β‹― + π‘₯𝑛 ∑ π‘₯
=
𝑛
𝑛
WherexΜ… stands for the sample mean. It is read “x bar”
The lower case n is the number of items in the sample
The arithmetic mean has several important properties:
1. Every set of interval level and ratio level data has a mean.
2. All the values are included in computing the mean.
3. A set of data has only one mean
The mean does have several disadvantages, however. Recall that the mean uses
the value of every item in the sample or population, in its computation. If one or two of
these values are either extremely large or extremely small, the mean might not be an
appropriate average to represent the data.
The mean is also inappropriate if there is an open-ended class for data tallied into
a frequency distribution. If a frequency distribution has the open-ended class “$100,000
and close to $100,000, $500,000, or $10 million. Since we lack information about their
incomes, the arithmetic mean income for this distribution cannot be determined.
9|Page
[BG1201] STATISTICS I
Weighted Mean
Is a special case of the arithmetic mean. When the data are not equally important,
we can assign to each a weight that is proportional to its relative importance and
calculate the weighted mean
π‘₯Μ… =
𝑀1 π‘₯1 + 𝑀2 π‘₯2 + β‹― + 𝑀𝑛 π‘₯𝑛 ∑ 𝑀π‘₯
=
∑𝑀
𝑀1 + 𝑀2 + β‹― + 𝑀𝑛
Example
Amy getsquiz scores of 65, 83, 80 and 90 points. She gets 92 points on
her final examination. Find the mean score if the quizzes each count for 15% and the
final counts for 40% of the final grade.
Shift
CLR
Shift 15;
DT
=1
65
DT
Shift
;
=
83
15
DT
Shift
;
=
80
15
DT
Shift
;
=
90
15
DT
Shift
;
=
92
40
Shift
S-var
1
=
84.5
Combined mean
π‘₯Μ… =
∑ 𝑛π‘₯Μ…
𝑛1 π‘₯Μ…1 + 𝑛2 π‘₯Μ…2 + β‹―
=
∑𝑛
𝑛1 + 𝑛2 + β‹―
10 | P a g e
[BG1201] STATISTICS I
Median
It has been pointed out that for data containing one or two very large or very small
values, that arithmetic mean may not be a good measure of central tendency. For such
case, a different measure of central tendency which can better describe data is the
median.
Shape of the distribution
ο‚·
ο‚·
Symmetric
Asymmetric (Skewed to the right or skewed to the left)
As the distribution becomes nonsymmetrical, or skewed, the relationship among
the three averages changes. In a positively skewed distribution, the arithmetic mean is
the largest of the three averages. Why? Because the mean is influenced more than the
median or mode is the smallest
11 | P a g e
[BG1201] STATISTICS I
Conversely, in a distribution that is a negatively skewed, the mean is the lowest of
the three averages. The mean is influenced by a few extremely low observations. The
median is greater than the arithmetic mean and the modal value is the largest.
**If the distribution is highly skewed, the mean should not be used to represent the
data**
12 | P a g e
[BG1201] STATISTICS I
Chapter 4 Why study dispersion
The first reason: an average only the locate the data but does not tell us anything
the spread of the data
For example, if your nature guide told you that the river ahead average 3 feet in
depth, would you cross it without additional? Probably not. You would want to know
something about the variation in the depth. If the maximum depth of the river 3.25 feet
and the minimum 2.75 feet. You not probably agree to cross. Before making decision
about crossing the river, you want information on both typical depth and the variation in
the depth of the river.
The second reason is to compare the spread in two or more distributions.
A small value for a measure of dispersion indicate that the data are clustered
around the mean. Conversely, a large measure of dispersion indicates that the data are
scatter widely about their mean.
We will consider several measure of dispersion. The range is based on the
location of the largest and the smallest values in the data set. The mean deviation, the
variance, and the standard deviation are all based on deviations from the mean.
13 | P a g e
[BG1201] STATISTICS I
Range
The simplest measure of dispersion.
Max – Min
Mean Deviation
A serious defect of the range is that it is based on only two values, the highest
and the lowest, it does not take into consideration all of the values. The mean deviation
does. It measure the mean amount by which the values in a population, or sample, vary
from their mean.
∑|π‘₯ − π‘₯Μ… |
𝑛
Where | | indicates the absolute value (the sign of the deviation from the mean
are disregarded
14 | P a g e
[BG1201] STATISTICS I
The mean deviation has two advantages. First, if uses all the values in the
computation. Second, it is easy to understand – it is average amount by which values
deviate from the mean. However, its major drawback is the use of absolute values.
Generally, absolute values are difficult to work with, so the mean deviation is not
used as frequently as other measures of dispersion, such as standard deviation.
Variance and standard deviation are also based on the deviations from the mean.
Variance is the average of squared deviations from the mean.
Standard deviation is the positive square root of the variance
∑(π‘₯ − πœ‡)2
𝜎 =
𝑁
2
15 | P a g e
[BG1201] STATISTICS I
Population variance
(pronounced sigma square)
Why would we use the standard deviation when we already have the variance?
Because the standard deviation is a more measure. The variance is a squared
quantity, it is an average of squared numbers. By taking its square root, we “unsquare”
the unit and get quantity denoted in the original unit in the problem. If the observation
differ from the mean by one unit or more, the variance tends to be large because it is in
squared units. The mathematical properties of the valiance simplify some computation,
but the standard deviation is more easily interpreted.
√∑(π‘₯ − πœ‡)2
𝜎=
𝑁
*Population standard deviation *
(pronounced sigma)
2
∑
(π‘₯
−
π‘₯Μ…
)
𝑆2 =
𝑛−1
16 | P a g e
[BG1201] STATISTICS I
Sample variance
∑(π‘₯ − π‘₯Μ… )2
𝑆=√
𝑛−1
*sample standard deviation*
Why is this seemingly insignificant change made in denominator? although the
use of n is logical, it tends to underestimate the population variance, 𝜎 2 . The use of (n1) in the denominator provides the appropriate correction for this tendency. because the
primary use of sample statistics like 𝑆 2 is to estimate population parameter like 𝜎 2 , (n1) is preferred to n when defining the sample variance.
Some properties of the mean and the variance
1.
If a fixed value d is added or subtracted from each of the observations in the
data, then
a.
The mean of the new data = mean of original ±π‘‘
b.
The variance remains = unchanged
2.
If each observed value in the data is multiplied by a fixed constant c, then
a.
Mean of the new data = C time mean of original
17 | P a g e
[BG1201] STATISTICS I
b.
Variance of new data = 𝐢 2 time variance of original
Relative Dispersion
Coefficient of variation (CV.) is very useful when
1.
The data are in different units, but the means are far apart (such as the incomes
of the top executives and the incomes of the unskilled employees)
CV. is the ratio of standard deviation to the mean, expressed as a percent
𝐢𝑉 =
𝑆𝐷
(100) = β‹― %
π‘šπ‘’π‘Žπ‘›
The coefficient of variation is often used as a measure of risk, for
instance, in investment, the CV. measures the variation of the returns (standard
deviation) relative to the size of the mean return
Skewness
is the measurement of the lack of symmetry of the distribution.
Coefficient of skewness
π‘†π‘˜ =
(π‘šπ‘’π‘Žπ‘›−π‘šπ‘’π‘‘π‘–π‘Žπ‘›)
π‘ π‘‘π‘Žπ‘›π‘‘π‘Žπ‘Ÿπ‘‘ π‘‘π‘’π‘£π‘–π‘Žπ‘‘π‘–π‘œπ‘›
18 | P a g e
[BG1201] STATISTICS I
Chapter 5 Principles of counting
Counting is a mathematical technique that enables us to determine number of
possible ways an event can occur.
The Multiplication Formula
If there are M ways of doing one thing and N ways of doing another thing, there
are MxN ways of doing both.
The Addition Formula
If there are M ways of doing one thing and N ways of doing another thing, there
are M+N ways of doing either one but not both.
Factorial ( n! is called n factorial )
Is the continued product of the first n natural numbers.
n! = n(n-1)(n-2)(n-3)…(3)(2)(1)
Combination
The number of ways to choose r objects from a group of n objects without regard
to order. (Order is not important)
𝑛!
nCr = π‘Ÿ!(𝑛−π‘Ÿ)!
19 | P a g e
[BG1201] STATISTICS I
Probability
The probability of an event is the measure of the chance that the event will occur.
(It describes the relative possibility the event will occur)
Probability can only assume a value between 0 and 1 or between 0% and 100%.
Three key words are used in the study of probability ; experiment , outcome , and event.
ο‚·
Experiment A process that leads to the occurrence of one (and only one) of
several possible observations.
ο‚·
Outcome
A particular result of an experiment.
ο‚·
Event
A collection of one or more outcomes of an experiment.
Approaches to Probability
Two approaches to probability will be discussed, namely, the objective and the
subjective viewpoints.
20 | P a g e
[BG1201] STATISTICS I
Objective probability
1.
Classical probability
π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œπ‘“ π‘“π‘Žπ‘£π‘œπ‘Ÿπ‘Žπ‘π‘™π‘’ π‘œπ‘’π‘‘π‘π‘œπ‘šπ‘’π‘ 
Probability of an event
2.
= π‘‘π‘œπ‘‘π‘Žπ‘™ π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œπ‘“ π‘π‘œπ‘ π‘ π‘–π‘π‘™π‘’ π‘œπ‘’π‘‘π‘π‘œπ‘šπ‘’π‘  =
𝑛(𝐸)
𝑛(𝑆)
Empirical probability
Probability of event happening =
π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œπ‘“ π‘‘π‘–π‘šπ‘’π‘  𝑒𝑣𝑒𝑛𝑑 π‘œπ‘π‘π‘’π‘Ÿπ‘Ÿπ‘’π‘‘ 𝑖𝑛 π‘π‘Žπ‘ π‘‘
π‘‘π‘œπ‘‘π‘Žπ‘™ π‘›π‘’π‘šπ‘π‘’π‘Ÿ π‘œπ‘“ π‘œπ‘π‘ π‘’π‘Ÿπ‘£π‘Žπ‘‘π‘–π‘œπ‘›π‘ 
Subjective probability
The available opinions and other information and then estimating or assigning the
probability.
Some Rules of Probability
Rule of Addition
General rule of addition is used to combine events that are not mutually exclusive.
P(A or B) = P(A) + P(B) - P(A and B)
P(AUB) = P(A) + P(B) - P (A∩B)
21 | P a g e
[BG1201] STATISTICS I
If two events are mutually exclusive, the special rule of addition is used to
combine.
P(A or B) = P(A) + P(B)
P(AUB) = P(A) + P(B)
Complement Rule (A‘ or Aᢜ )
P(A) + P(A‘) = 1
P(A‘) = 1 – P(A)
Condition Probability
Conditional Probability
P(AΗ€B) =
𝑃(𝐴∩𝐡)
𝑃(𝐡)
=Probability of A given that B has occurred
P(BΗ€A) =
𝑃(𝐴∩𝐡)
𝑃(𝐴)
= Probability of B given that A has o
22 | P a g e
[BG1201] STATISTICS I
Chapter 6 Discrete Probability Distribution
Probability Distribution
A listing of all the outcomes of on experiment and the probability associated with
each outcome.
Ex. Suppose we are interested in the number of heads shoeing face up on three
tosses of a coin. What is the probability distribution for the number of heads?
Sample Space= S = {𝑇𝑇𝑇, 𝑇𝑇𝐻, 𝑇𝐻𝑇, 𝐻𝑇𝑇, 𝑇𝐻𝐻, 𝐻𝑇𝐻, 𝐻𝐻𝑇, 𝐻𝐻𝐻}
X= Number of heads
0
1
2
3
P(x)
1/8 = 0.125
3/8 = 0.375
3/8 = 0.375
1/8 = 0.125
1
Random Variables
A quantity resulting from an experiment that, by change, can assume, different
values. A random variable may be either discrete or continuous.
23 | P a g e
[BG1201] STATISTICS I
Discrete Random Variables
A variable that can assume only certain clearly separated values resulting from a
count of some item of interest.
Ex. number of students, number of rooms in a house.
Continuous Random Variable
A variable that can assume one of an infinitely large number of values, within
certain limitations.
Ex. height, weight, tire pressure, …
Binomial Probability Distribution
The binomial probability distribution is one of the most widely used discrete
probability distribution. It is applied to fond the probability that an outcome will occur x
times in n performance of an experiment.
Characteristics of a binomial distribution
1.
An outcome on each trial of an experiment is classified into one of two mutually
exclusive categories – a success or a failure.
2.
The random variable is the result of counting the number of success in a fixed
number of trials.
3.
The probability of a success stays the same for each trial. So does the probability
of failure
24 | P a g e
[BG1201] STATISTICS I
4.
The trial are independent, meaning that the outcome of one trial does not affect
the outcome of any other trial.
To construct a particular binomial probability distribution, we must know the
number of trails and the probability of success on each trail. For example, if Stat I
examination consists of 10 multiple choice questions, the number of trails is 10. If each
1
question on each trail is 4 or 0.25.
Using the formula of the binomial probability distribution
𝑃(π‘₯ ) = 𝑛𝐢π‘₯ πœ‹ 2 (1 − π‘₯)𝑛−π‘₯
Where
n is the number of trails.
x is the number of successes.
πœ‹ is the probability of a success on each trail.
Mean of binomial distribution
πœ‡ = π‘›πœ‹
Variance of a binomial distribution
𝜎 2 = π‘›πœ‹(1 − πœ‹)
25 | P a g e
[BG1201] STATISTICS I
Chapter 7 The normal Probability Distribution
We will continue our study of probability distribution in this chapter by examining
a very important continuous probability distribution, the normal probability distribution.
As noted in the preceding chapter, a continuous random variable is one that can
assume an infinite number of possible values within specified range. A large of
phenomena in the real world is normally distributed either exactly or approximately.
The normal probability distribution and its accompanying normal curve have the
following characteristics:
26 | P a g e
Download