Chapter 1.1_0

advertisement
Chapter 1
EQT 271 SEM II 2014/2015
DESCRIPTIVE AND INFERENTIAL STATISTICS
1. Statistics is the sciences of conducting studies to collect, organize, summarize,
analyze, and draw conclusions from data.
2. A variable is a characteristic of attributes that can assume different values.
3. A population consists of all subjects (human or otherwise) that are being studied.
4. A sample is a group of subjects selected from a population.
5. In a Census, the researchers collect data from all subjects in the population
6. Descriptive statistics consists of the collection, organization, summarization, and
presentation of data.
7. Inferential statistics consist of generalizing from samples to populations, performing
estimations and hypothesis tests, determining relationships among variables and
making predictions.
VARIABLES AND TYPES OF DATA
1. Qualitative variables are variables that have distinct categories according to some
characteristic or attribute. (e.g : Genders (male or female), Race (Malay, Chinese,
Indian))
2. Quantitative variables are variables that can be counted or measured.
3. Discrete variables assume values that can be counted. (e.g: Number of children in a
family, number of students in a classroom)
4. Continuous variables can assume an infinite number of values between any two
specific values. They are obtained by measuring. They often include fractions and
decimals.
Khatijahhusna Abd Rani
Page 1
Chapter 1
EQT 271 SEM II 2014/2015
5. Level of measurements:
a. Nominal: classifies data into mutually exclusive (nonoverlapping) categories
in which no order or ranking can be imposed on the data.
b. Ordinal: classifies data into categories that can be ranked; however, precise
differences between the ranks do not exist.
c. Interval: ranks data and precise differences between units of measure do
exist; however, there is no meaningful zero
d. Ratio: possesses all the characteristics of interval measurement, and there
exists a true zero. In addition, true ratios exist when the same variable is
measured on two different members of the population.
FREQUENCY DISTRIBUTIONS AND GRAPHS
1. A frequency distribution is the organization of raw data in table form, using classes
and frequencies.
2. A Grouped frequency distribution is used if variables take a large number of values
or the variable is continuous.
3. Class Width: subtracting lower class limit of one class from the lower class limit of
the next class. The class width also can be found by subtracting lower boundary
from the upper boundary for any given class.
Class limits Class boundaries
100-104
99.5-104.5
105-109
104.5-109.5
Class Width
= 105-100
=5
Class Width
OR
= 109.5-104.5
=5
4. The Histogram is a graph that displays the data by using contiguous vertical bars (unless the
frequency of a class is 0) of various heights to represent the frequencies of the classes.
Khatijahhusna Abd Rani
Page 2
Chapter 1
EQT 271 SEM II 2014/2015
Measures of central tendency
1. The Mean ( x ) is the sum of the values, divided by total number of values
2. The Median ( ~
x ) is the midpoint of the data array.
Steps:
I.
Arrange the data values in ascending order
II.
Determine the number of values in the data set
III.
a.
If n is odd, select the middle data as the median
b.
If n is even, find the mean of the two middle values.
3. The Mode ( x̂ ) is the value that occurs most often (higher frequency) in a data set
4. Best Measure of central tendency:
Type of Variable
Best measure of central tendency
Nominal
Mode
Ordinal
Median
Interval/Ratio (not skewed)
Mean
Interval/ Ratio (Skewed)
Median
Measures of dispersion
1. The Range is the highest value minus the lowest value. The Symbol R is used for the
range.
R = highest value-lowest value
2. The Variance is the average of the squares of the distance each value is from the
mean.
3. The Standard deviation is the square root of variance.
4. A statistical rule stating that for normal distribution (bell shaped), is almost all data
will fall within three standard deviations of the mean. The Empirical Rule shows that
68% will fall within the 1 standard deviation, 95% within 2 two standard deviations,
and 99.7% will fall within the 3 standard deviations of the mean.
Khatijahhusna Abd Rani
Page 3
Chapter 1
EQT 271 SEM II 2014/2015
Example: Suppose the age distribution of a sample of 1000 persons is bell shaped (normally
distributed) with a mean of 40 years and a standard deviation of 12 years.
s
s
Approximately 68% of the sample is
between ages 28 (40-12) years old and
52(40+12) years old.
28
x  40
52
Measures of position
1. A z score or Standard score for a value is obtained by subtracting the mean from the
value dividing the result by the standard deviation.
Z
value - mean
standard deviation
The major purpose of standard scores is to place scores for any individual on any
variable having any mean and standard deviation on the same standard scale so
that comparisons can be made.
When all data for a variable are transformed into z scores, the resulting distribution
will have a mean of 0 and standard deviation of 1.
z score for samples,
z
xx
s
z score for populations,
z
Khatijahhusna Abd Rani
x

Page 4
Chapter 1
EQT 271 SEM II 2014/2015
2. Quartiles divide the distribution into four groups, denoted by Q1 Q2 Q3
3. An Outlier is an extremely high or an extremely low data value when compared with
the rest of the data values.
The five number summary and Boxplots
1. A boxplot can be used to graphically represent the data set. These plots involve five
specific values:
i.
The lowest value of the data sets (i.e., minimum)
ii. Q1
iii. Q2
iv. Q3
v. The highest value of the data set (i.e., maximum)
STAY FOCUSED AND NEVER GIVE UP!! ^_^
Khatijahhusna Abd Rani
Page 5
Download