Elementary Statistics

advertisement
Elementary Statistics
Chapter 1
Dr. Ghamsary
Elementary Statistics
M. Ghamsary, Ph.D.
Chap 01
1
Page 1
Elementary Statistics
Statistics:
Chapter 1
Dr. Ghamsary
Page 2
Statistics is the science of
• collecting,
• organizing,
• summarizing,
• analyzing data, and
• Draw conclusions.
Objective: The primary objective of statistics is inference.
The applications of statistics can be divided into two broad areas:
1. Descriptive Statistics
2. Inferential Statistics
Variable: is a characteristic of an individual population unit.
Data are the values (measurements or observations) that the variables can assume.
Variables whose values are determined by chance are called random variables.
For example: 12, 13, 69, 98, 78, 87, 36, 54, 68, 36, 63, 85, 79, 75, 32, 16, 57, 58, 34, 91, 74, 83, 92.
Each value in the data set is called a data value or a datum.
1. Descriptive statistics: consists numerical and graphical techniques to summarize and present
the information in the data set.
2. Inferential statistics consists of estimation, prediction, or generalizing from samples to
populations.
Qualitative variables are variables that can be placed into distinct categories, according to
some characteristic or attribute.
2
Elementary Statistics
Chapter 1
Dr. Ghamsary
Page 3
For example,
• gender (male or female)
• Race (White, Black, Hispanic, etc)
• Religion
Quantitative variables: are numerical in nature and can be ordered or ranked.
For example,
• Age is numerical and the values can be ranked.
• Height
• Scores on a test of Stat class
Discrete variables Assumes a finite number of possible values that can be counted.
For example:
• Numbers of telephone calls is made at the switch board of our school every day. {0, 1,
2, 3, 4,…}
• Number of accidents in FWY 5
• Number of babies delivered at LLU hospital
Continuous variables can assume infinitely many values between any two specific values
such that there would be no gaps.
• Height of boys born at UCLA hospital on July 4th
• Amount of rain falls in California in the year 2000.
• # of car accidents in FWY 10 from 5 to 7PM daily
• # of babies delivered at LLU hospital daiy
3
Elementary Statistics
Chapter 1
Dr. Ghamsary
Page 4
Levels of Measurement
When we observe and record a variable, it has characteristics that influence the type of statistical
analysis that we can perform on it. These characteristics are referred to as the level of measurement of
the variable. The first step in any statistical analysis is to determine the level of measurement; it tells us
what statistical tests can and cannot be performed.
• There are four levels of measurement:
1. Nominal
2. Ordinal
3. Interval
4. Ratio
1. The nominal level of measurement: Refers to data consist of names and/or categories
so that the data cannot be arranged in any specific ordering scheme. The nominal level of measurement
occurs when the observations do not have a meaningful numeric value.
For example:
• Sex ( Male, Female)
• Race (White, Black, Hispanic, Asian, Persian, etc)
• Colors of car in the street
• Area Code
• Zip code
The values of nominal variables cannot be meaningfully:
• compared to see if one is larger than another
• added or subtracted
• multiplied or divided
• calculate the mean (what most people call the average)
4
Elementary Statistics
Chapter 1
Dr. Ghamsary
Page 5
2. The ordinal level of measurement classifies data into categories that can be ranked;
but differences between the ranks cannot be determined. The Ordinal variables are used to represent
observations that can be categorized and rank ordered
For example:
• Letter Grades such as A, superior; B, good; C, average; D, poor; F, Fail
• Size of cars in the street: Small, Medium, and Large.
• Scoring in games: 1st, 2nd, 3rd,….
• Class rank,
• Order of finishing a horse race,
• How much you prefer various vegetables
The values of ordinal variables can be:
• compared to see if they are equal or not
• compared to see if one is larger or smaller than another
The values of ordinal variables cannot be meaningfully:
• added or subtracted
• multiplied or divided
• calculate the mean
3. The interval level of measurement is like ordinal, with additional property that
differences between units of data can be defined, but there is no meaningful zero. The Interval variables
represent observations that can be categorized, rank ordered, and have an unit of measure.
• An unit of measure implies that the difference between any two successive values is
identical
With an interval scaled variable, the value 0 does not represent the complete absence of the variable.
5
Elementary Statistics
Chapter 1
Dr. Ghamsary
Page 6
The values of interval variables can be:
• compared to see if they are equal or not
• compared to see if one is larger or smaller than another
• added or subtracted
The values of interval variables cannot be meaningfully:
• multiplied or divided (eg. 60oF is not twice as hot as 30oF)
For example:
• Temperature, like Fahrenheit as, we know there is no natural 0.
• The years
• IQ scores
• Shoe size
4. The ratio level of measurement is just like the interval measurement, and there exists a
natural zero. In addition, true ratios and differences both exist for the same variable. The Ratio
variables represent observations that can be categorized, rank ordered, have an unit of measure and have
a true zero
• The true zero implies that a value of zero represents the complete absence of the variable
The values of ratio variables can be:
• compared to see if they are equal or not
• compared to see if one is larger or smaller than another
• added or subtracted
• multiplied or divided
6
Elementary Statistics
Chapter 1
Dr. Ghamsary
Page 7
For example:
• Weight
• Height
• Age
• Length
• Distance
Most students have trouble differentiating between interval and ratio levels of measurement.
Here is a simple test: If one number is twice the other is the quantity being measured also twice the
other quantity?
• For example if you have two weights 120 lbs. and 240 lbs. it should be clear that 240
lbs. is twice as heavy as 120 lbs. So weights are an example of a ratio level of
measurement.
• However say you have two temperatures 30 degrees and 60 degrees, 60 degrees is not
twice as hot as 30 degrees, so this is an example of an interval level of measurement.
Another test is that in the ratio level of measurement zero means absence of quantity.
If you consider weights, 0 lb. means that you have NO weight (so weight is ratio), while with the
interval level of measurement, such as temperature 0 degrees Fahrenheit does not mean the absence of
heat which is what temperature measures.
Population: consists of all units (subjects, objects, etc) that are being studied.
Sample is a subset of the units of a population.
Parameter: descriptive measure of the population: Usually represented by Greek letters
Statistic: descriptive measure of a sample: Usually represented by Roman letters
7
Elementary Statistics
Chapter 1
Dr. Ghamsary
Measure
Page 8
Sample
Population
(Statistics)
(Parameters)
Mean
x
µ
Variance
s2
σ2
Standard Deviation
s
σ
Correlation Coefficient
r
ρ
Proportion
p̂
p
Slope of Simple Regression
β̂1
β1
Size
n
N
Summary of Data Classifications
8
Elementary Statistics
Chapter 1
Dr. Ghamsary
Page 9
Example1: From a sample of students in your statistics class, you collect the following: the student's
name, gender, SAT score, age, IQ, birth date (BD), and their grade in a freshman level math class.
Use the measurement of Qualitative or Quantitative to answer the following. Which scale of
measurement?
1. The variable student's name is measured on
2. The variable student's gender is measured on
3. The variable student's SAT score is measured on
4. The variable student's age is measured on
5. The variable student's IQ is measured on
6. The variable student's BD is measured on
Example2: From a sample of students in your statistics class, you collect the following: the student's
name, gender, SAT score, age, IQ, birth date, and their grade in a freshman level math class. Use the
measurement of Nominal, Ordinal, Interval or Ratio to answer the following. Which scale of
measurement?
1. The variable student's name is measured on
2. The variable student's gender is measured on
3. The variable student's SAT score is measured on
4. The variable student's age is measured on
5. The variable student's IQ is measured on
6. The variable student's BD is measured on
9
Elementary Statistics
Chapter 1
Dr. Ghamsary
Page 10
Example3: A researcher is claiming that the average age of women who are graduated from medical
school at Loma Linda Medical School is about 27 years. To test his hypothesis, he randomly selected
200 female doctors who have graduated from LLU medical school.
1. Describe the population.
2. Identify the variable of interest.
3. Is the variable quantitative (qualitative)?
4. Is the variable discrete or continuous?
5. Identify the type of the variable.
6. Describe the sample.
7. Describe the inference.
Example4: A researcher in LA county is claiming that the men and women have different attitude
toward abortion. He randomly selected 500 men and 500 women and ask them to see if they are antiabortion.
1. Describe the population.
2. Identify the variable of interest.
3. Is the variable quantitative(qualitative)?
4. Is the variable discrete or continuous?
5. Identify the type of the variable.
6. Describe the sample.
7. Describe the inference.
Example5: Read the following article and answer the following questions
A study in California (which also funds abortions for the poor) found that by 1990, among young
white women. there was no difference in the rate of breast cancer between rich and poor.
1. Describe the population.
2. Identify the variable of interest.
3. Is the variable quantitative(qualitative)?
4. Is the variable discrete or continuous?
5. Identify the type of the variable.
6. Describe the sample.
7. Describe the inference
10
Elementary Statistics
Chapter 1
Dr. Ghamsary
Page 11
Methods of Sampling: There are many method of sampling, but we will describe 5
common and basic method of sampling as follows:
a.
Convenience Sampling
b.
Simple Random Sampling
c.
Systematic Sampling
d.
Stratified Sampling
e.
Cluster Sampling
Convenience sampling: attempts to obtain a sample of convenient elements.
Often,
respondents are selected because they happen to be in the right place at the right time.
For example:
• use of students, and members of social organizations
• mall intercept interviews without qualifying the respondents
• department stores using charge account lists
• “people on the street” interviews
Simple Random Sampling (SRS)
• Each element in the population has a known and equal probability of selection.
• Each possible sample of a given size (n) has a known and equal probability of being the
sample actually selected.
• This implies that every element is selected independently of every other element
11
Elementary Statistics
Chapter 1
Dr. Ghamsary
Page 12
Systematic Sampling
• The sample is chosen by selecting a random starting point and then picking every ith element in
succession from the sampling frame. For example, there are 1000 elements in the population and a
sample of 100 is desired. In this case the sampling interval is 10.
Stratified Sampling
• A two-step process in which the population is partitioned into subpopulations, or strata.
• The strata should be mutually exclusive and collectively exhaustive in that every population
element should be assigned to one and only one stratum and no population elements should be
omitted.
• Next, elements are selected from each stratum by a random procedure, usually SRS.
• A major objective of stratified sampling is to increase precision without increasing cost
• The elements within a stratum should be as homogeneous as possible, but the elements in different
strata should be as heterogeneous as possible.
• The stratification variables should also be closely related to the characteristic of interest.
• Finally, the variables should decrease the cost of the stratification process by being easy to
measure and apply.
• In proportionate stratified sampling, the size of the sample drawn from each stratum is
proportionate to the relative size of that stratum in the total population.
• In disproportionate stratified sampling, the size of the sample from each stratum is proportionate
to the relative size of that stratum and to the standard deviation of the distribution of the
characteristic of interest among all the elements in that stratum.
12
Elementary Statistics
Chapter 1
Dr. Ghamsary
Page 13
Cluster Sampling
• The target population is first divided into mutually exclusive and collectively exhaustive
subpopulations, or clusters.
• Then a random sample of clusters is selected, based on a probability sampling technique such as
SRS.
• For each selected cluster, either all the elements are included in the sample (one-stage) or a sample
of elements is drawn probabilistically (two-stage).
• Elements within a cluster should be as heterogeneous as possible, but clusters themselves should
be as homogeneous as possible. Ideally, each cluster should be a small-scale representation of the
population.
• In probability proportionate to size sampling, the clusters are sampled with probability
proportional to size. In the second stage, the probability of selecting a sampling unit in a selected
cluster varies inversely with the size of the cluster.
13
Elementary Statistics
Chapter 1
Dr. Ghamsary
Page 14
Review of Chapter 01
• Determine whether the given values are from a discrete or continuous data set.
1. In a sample data of 100 Pepsi’s can we find that the average size of Pepsi’s can was 11.98oz
2. Ina survey of 1,011 adults, it is found that 450 of them have smoked at least once in their life.
3. Ina survey of 3,289 adults, it is found that 45% of them have garden in their homes
4. The average American drink 2 cup of coffee per day.
• Determine whether the given variables are from a Qualitative or Quantitative.
5. Area Codes of for the phone # of students in this class
6. Social Security of students in this class
7. Professor’s nationality who are teaching in this school
8. Height of students in this class.
• Determine which of the four levels of measurement is most appropriate: Nominal, Ordinal, Interval,
or Ratio.
9. Area Codes of for the phone # of students in this class
10. Social Security of students in this class
11. Professor’s nationality who are teaching in this school
12. Height of students in this class.
13. Ratings of good, average, poor for today lecture.
14. Current temperatures of this class room.
15. Numbers on the Laker’s basketball players.
16. The year of student’s birth day.
17. Drivers license numbers.
14
Elementary Statistics
Chapter 1
Dr. Ghamsary
Page 15
• Identify which of these types of sampling is used: Random (SRS), Systematic, Stratified, Cluster,
or Convenience.
18. An Los Angeles Times reporter gets a reaction to a breaking story by poling people as they pass
the front of the Times building.
19. Dr. Ghamsary has randomly selected 5 students in his class.
20. The Orange County Commissioner of Jurors obtains a list of 55,014 car owners and constructs a
poll of jurors by selecting every 50th name on the list.
21. In a Harris poll of 1,011 adults, the interview subjects were selected by using a computer to
randomly generate telephone numbers that were then called.
22. A Ford Motor Company researcher has partitioned all registered cars into categories of compact,
mid-size, and family-size. He is surveying 75 car owners from each category.
23. Motivated by a student who died from binge drinking, Chico State conducts a study of student
drinking by randomly selecting 10 different classes and interviewing all of the students in each
of those classes.
24. A statistics student obtains height/weight data by interviewing the members of his fraternity.
25. A UCLA researcher surveys all cardiac patients in each of 30 randomly selected hospitals.
15
Download