Uploaded by Madhumita

INTRODUCTION TO STATISTICS

advertisement
SQQS1013 Elementary Statistics
INTRODUCTION TO
Statistics
1.1 WHAT IS STATISTICS?
•
The word statistics is derived from classical Latin root, status which means
state.
•
Statistics has become the universal language of the sciences.
•
As potential users of statistics, we need to master both the “science” and the
“art” of using statistical methodology correctly.
▪ Carefully defining the situation
▪ Gathering data
▪ Accurately summarizing the data
▪ Deriving and communicating meaningful conclusions
Specific definition:
Statistics is a collection of procedures and principles for
gathering data and analyzing information to help people
make decisions when faced with uncertainty.
•
Nowadays statistics is used in almost all fields of human effort such as:
Example applications of Statistics
Education
Agricultural
Chapter 1: Introduction to Statistic
Businesses
Health
1
SQQS1013 Elementary Statistics
• Sports
A statistician may keep records of the number of hits a baseball player gets in a
season.
• Financial
Financial advisor uses some statistical information to make reliable predictions
in investment.
• Public Health
An administrator would be concerned with the number of residents who contract
a new strain of flu virus during a certain year.
• Others
Any Idea?…..
1.2 TWO ASPECTS IN STATISTICS
Statistics has two aspects:
1. Theoretical / Mathematical Statistics
 Deals with the development, derivation and proof of statistical theorems,
formulas, rules and laws.
2. Applied Statistics
o Involves the applications of those theorems, formulas, rules and laws to solve
real world problems.
o Applied Statistics can be divided into two main areas, depending on how data
are used. The two main areas are:
Descriptive statistics
•
•
•
What most people think of
when they hear the word
statistics
Includes the collection,
presentation, and description
of sample data.
Using graphs, charts and
tables to show data.
Chapter 1: Introduction to Statistic
Inferential statistics
•
•
Refers to the technique of
interpreting the values resulting
from the descriptive techniques
and making decisions and
drawing conclusions about the
population
2
SQQS1013 Elementary Statistics
ASPECTS OF STATISTICS
Theoretical/Mathematical
Statistics
Deals with the development,
derivation and proof of statistical
theorems, formulas, rules and
laws.
Applied
Statistics
Involves the applications of those
theorems, formulas, rules and laws
to solve real world problems.
Descriptive
Statistics
Consist of methods for
collecting, organizing,
displaying and
summarizing data
Inferential
Statistics
Consist of methods that use
results obtained from sample to
make decisions or conclusions
about a population
Example 1
Determine which of the following statements is descriptive in nature and which
is inferential.
•
a.
Of all U.S kindergarten teachers, 32% say that “knowing the alphabet” is
an essential skill.
b.
Of the 800 U.S kindergarten teachers polled, 32% say that “knowing the
alphabet” is an essential skill.
Why do we have to study statistics?
▪
To read and understand various statistical studies in related field.
▪
To communicate and explain the results of study in related field using
our own words.
▪
To become better consumers and citizens.
Chapter 1: Introduction to Statistic
Tutorial
3
SQQS1013 Elementary Statistics
1.3 BASIC TERMS IN STATISTICS
•
Population vs. Sample
Population
• A collection of all individuals
about which information is desired.
Sample
•
A subset of the population.
‘Individuals’ are usually people but
could also be schools, cities, pet dogs,
agriculture fields, etc.
• There are two kinds of population:
▪ Finite population
When the membership of a
population can be (or could be)
physically listed.
e.g. the books in library.
▪ Infinite population
When the membership is unlimited.
e.g. the population of all people
who might use aspirin.
•
Parameter vs. Statistic
Parameter
Statistic
• A numerical value summarizing all
the data of an entire population.
• A numerical value summarizing the
sample data.
• Often a Greek letter is used to
symbolize the name of parameter.
• English alphabet is used
symbolize the name of statistic
Average/Mean
- µ
Standard deviation - 
e.g. The “average” age at time of
admission for all students who
have ever attended our college.
Chapter 1: Introduction to Statistic
Tutorial
to
Average/Mean
Standard deviation - s
e.g. The “average” height, found by
using the set of 25 heights.
4
SQQS1013 Elementary Statistics
•
Variable
A characteristic of interest about each individual element of a population
or sample.
e.g. : A student’s age at entrance into college, the color of student’s hair.
•
Data value
The value of variable associated with one element of a population or
sample. This value may be a number, a word, or a symbol.
e.g. : Farah entered college at age “23”, her hair is “brown”.
•
Data
The set of values collected from the variable from each of the elements
that belong to sample.
e.g. : The set of 25 heights collected from 25 students.
•
Census : a survey includes every element in the population.
•
Sample survey : a survey includes every element in selected sample
only.
Example 2
A statistics student is interested in finding out something about the average ringgit
value of cars owned by the faculty members of our university. Each of the seven
terms just described can be identified in this situation.
i) Population: the collection of all cars owned by all faculty members at our
university.
ii) Sample
: any subset of that population. For example, the cars owned by
members the statistics department.
iii) Variable
: the “ringgit value” of each individual car.(RM)
iv) Data value: one data value is the ringgit value of a particular car. Ali’s
car, for example, is value at RM 45 000.
v) Data
: the set of values that correspond to the sample obtained
(45,000; 55,000; 34, 0000 ;…).
vi) Parameter: (which we are seeking information is) the “average” value of all
cars in the population.
vii) Statistic
: (will be found is) the “average” value of the cars in the sample.
Chapter 1: Introduction to Statistic
Tutorial
5
SQQS1013 Elementary Statistics
1.3.1 Types of Variables
•
Quantitative (numerical) Variables
▪
A variable that quantifies an element of a population.
e.g. the “total cost” of textbooks purchased by each student for this
semester’s classes.
Arithmetic operations such as addition and averaging are meaningful
for data that result from a quantitative variable.
▪
▪
Can be subdivided into two classifications: discrete variables and
continuous variables.
Discrete Variables
▪
▪
A quantitative variable that can
assume a countable number of
values.
Can assume any values
corresponding to isolated points
along a line interval. That is, there
is a gap between any two values.
e.g. Number of courses for which
you are currently registered.
•
Continuous Variables
▪
▪
A quantitative variable that can
assume an uncountable number
of values.
Can assume any value along a
line interval, including every
possible value between any two
values.
e.g. Weight of books and supplies you
are carrying as you attend class
today.
Qualitative (attribute, categorical) variables
▪
A variable that describes or categorizes an element of a population.
Chapter 1: Introduction to Statistic
Tutorial
6
SQQS1013 Elementary Statistics
e.g.: A sample of four hair-salon customers was surveyed for their
“hair color”, “hometown” and “level of satisfaction”.
EXERCISE 1
1. Of the adult U.S. population, 36% has an allergy. A sample of 1200 randomly
selected adults resulted in 33.2% reporting an allergy.
a.
Describe the population.
b.
What is sample?
c.
Describe the variable.
d.
Identify the statistic and give its value.
e.
Identify the parameter and give its value.
2. The faculty members at Universiti Utara Malaysia were surveyed on the question
“How satisfied were you with this semester schedule?” Their responses were to be
categorized as “very satisfied,” “somewhat satisfied,” “neither satisfied nor
dissatisfied,” “somewhat dissatisfied,” or “very dissatisfied.”
a.
Name the variable of interest.
b.
Identify the type of variable.
3. A study was conducted by Aventis Pharmaceuticals Inc. to measure the adverse
side effects of Allegra, a drug used for treatment of seasonal allergies. A sample of
679 allergy sufferers in the United States was given 60 mg of the drug twice a day.
The patients were to report whether they experienced relief from their allergies as
well as any adverse side effects (viral infection, nausea, drowsiness, etc)
a.
What is the population being studied?
b.
What is the sample?
c.
What are the characteristics of interest about each element in the population?
d.
Are the data being collected qualitative or quantitative?
4. Identify each of the following as an example of (1) attribute (qualitative) or (2)
numerical (quantitative) variables.
a.
The breaking strength of a given type of string
b.
The hair color of children auditioning for the musical Annie.
c.
The number of stop signs in town of less than 500 people.
d.
Whether or not a faucet is defective.
e.
The number of questions answered correctly on a standardized test.
1.3.2 Types of Data
•
•
Data set is the set of values collected from the variable from each of the
elements that belong to sample.
e.g. the set of 25 heights collected from 25 students.
Chapter 1: Introduction to Statistic
Tutorial
7
SQQS1013 Elementary Statistics
•
Data can be collected from a survey or an experiment.
Types of Data
Primary data
Secondary data
Necessary data obtained through survey
conducted by researcher
Primary Data Collection Techniques
• Data are collected by researcher and obtained
from respondent
1. Face to face interview
▪ Two ways communication where researcher(s)
asks question directly to respondent(s).
Advantages:
▪ Precise answer.
▪ Appropriate for research that requires huge data
collection.
▪ Increase the number of answered questions.
Disadvantages:
▪ Expensive.
▪ Interviewer might influence respondent’s
responses.
▪ Respondent refuse to answer sensitive or
personal question.
Data obtained from published material
by governmental, industrial or
individual sources
▪
▪
▪
▪
Published records from governmental,
industrial or individual sources.
Historical data.
Various resources.
Experiment is not required.
Advantages:
▪ Lower cost.
▪ Save time and energy.
Disadvantages:
▪ Obsolete information.
▪ Data accuracy is not confirmed.
2. Telephone interview
Advantages:
▪ Quick.
▪ Less costly.
▪ Wider respondent coverage.
Disadvantages:
▪ Limited interview duration.
▪ Demonstration cannot be performing.
▪ Telephone is not answered.
3. Postal questionnaire
▪ A set of questions to obtain related information of
conducted study.
▪ Questionnaires are posted to every respondent.
Advantages:
▪ Wider respondent coverage.
▪ Respondent have enough time to
questions.
▪ Interviewer influences can be avoided.
▪ Lower cost.
answer
Disadvantages:
▪ One way interaction.
▪ Low response rate.
Chapter 1: Introduction to Statistic
Tutorial
8
SQQS1013 Elementary Statistics
Any Idea?.......
Another technique to collect primary data is observation.
List the advantages and disadvantages of this technique.
1.3.2.1
Scale of Measurement
•
Data can also be classified by how they are categorized, counted or
measured.
•
This type of classification uses measurement scales with 4 common
types of scales: nominal, ordinal, interval and ratio.
Nominal Level of Measurement
▪
▪
▪
▪
A qualitative variable that
characterizes (or describes/names)
an element of a population.
Arithmetic operations not
meaningful for data.
Order cannot be assigned to the
categories.
Example:
- Survey responses:- yes, no,
undecided,
- Gender:- male, female
Interval Level of Measurement
▪
▪
▪
▪
▪
Involve a quantitative variable.
A scale where distances between
data are meaningful.
Differences make sense, but
ratios do not (e.g., 30°-20°=20°10°, but 20°/10° is not twice as
hot!).
No natural zero
Example:
- Temperature scales are interval
data with 25oC warmer than
20oC and a 5oC difference has
some physical meaning. Note
that 0oC is arbitrary, so that it
does not make sense to say
that 20oC is twice as hot as
10oC.
Ordinal Level of Measurement
▪
▪
▪
A qualitative variable that
incorporates and ordered position,
or ranking.
Differences between data values
either cannot be determined or are
meaningless.
Example:
- Level of satisfaction:- “very
satisfied”, “satisfied”,
“somewhat satisfied”, etc.
- Course grades:- A, B, C, D, or
F
Ratio Level of Measurement
▪ A scale in which both intervals
between values and ratios of values
are meaningful.
▪ A real zero point.
▪ Example:
- Temperature measured in degrees
Kelvin is a ratio scale because we
know a meaningful zero point
(absolute zero).
- Physical measurements of height,
weight, length are typically ratio
variables. It is now meaningful to
say that 10m is twice as long as
5m. This is because there is a
natural zero.
- The year 0 is arbitrary and it is
not sensible to say that the year
2000 is twice as old as the year
1000.
Chapter 1: Introduction to Statistic
Tutorial
9
SQQS1013 Elementary Statistics
Levels of Measurement
• Nominal - categories only
• Ordinal - categories with some order
• Interval - differences but no natural starting point
• Ratio - differences and a natural starting point
EXERCISE 2
1)
Classify each as nominal-level, ordinal-level, interval-level or ratio-level.
a. Ratings of newscasts in Malaysia.
(poor, fair, good, excellent)
b. Temperature of automatic popcorn poppers.
c. Marital status of respondents to a survey on
saving accounts.
d. Age of students enrolled in a martial arts course.
e. Salaries of cashiers of C-Mart stores.
2)
Data obtained from a nominal scale
a. must be alphabetic.
b. can be either numeric or non-numeric.
c. must be numeric.
d. must rank order the data.
3)
The set of measurements collected for a particular element is (are) called
a. variables.
b. observations.
c. samples.
d. none of the above answers is correct.
4)
The scale of measurement that is simply a label for the purpose of
identifying the attribute of an element is the
Chapter 1: Introduction to Statistic
Tutorial
10
SQQS1013 Elementary Statistics
a.
b.
c.
d.
ratio scale.
nominal scale.
ordinal scale.
interval scale.
5)
Some hotels ask their guests to rate the hotel’s services as excellent, very
good, good, and poor. This is an example of the
a. ordinal scale.
b. ratio scale.
c. nominal scale.
d. interval scale.
6)
The ratio scale of measurement has the properties of
a. only the ordinal scale.
b. only the nominal scale.
c. the rank scale.
d. the interval scale.
7)
Arithmetic operations are inappropriate for
a. the ratio scale.
b. the interval scale.
c. both the ratio and interval scales.
d. the nominal scale.
8)
A characteristic of interest for the elements is called a(n)
a. sample.
b. data set.
c. variable.
d. none of the above answers is correct.
9)
In a questionnaire, respondents are asked to mark their gender as male or
female. Gender is an example of a
a. qualitative variable.
b. quantitative variable.
c. qualitative or quantitative variable, depending on how the
respondents answered the question.
d. none of the above answers is correct.
10)
The summaries of data, which may be tabular, graphical, or numerical, are
referred to as
a. inferential statistics.
b. descriptive statistics.
c. statistical inference.
d. report generation.
11)
Statistical inference
a. refers to the process of drawing inferences about the sample based
on the characteristics of the population.
b. is the same as descriptive statistics.
c. is the process of drawing inferences about the population based on
the information taken from the sample.
d. is the same as a census.
Chapter 1: Introduction to Statistic
Tutorial
11
SQQS1013 Elementary Statistics
Chapter 1: Introduction to Statistic
Tutorial
12
SQQS1013 Elementary Statistics
EXERCISE 3
1. In each of this statements, tell whether descriptive or inferential statistics have
been used.
a) The average life expectancy in New Zealand is 78.49 years.
b) A diet high in fruits and vegetables will lower blood pressure.
c) The total amount of estimated losses from Tsunami flood was RM4.2
billion.
d) Researchers stated that the shape of a person’s ears is related to the
person’s aggression
e) In 2013, the number of high school graduates will be 3.2 million students.
2. Classify each variable as discrete or continuous.
a) Ages of people working in a large factory
b) Number of cups of coffee served at a restaurant
c) The amount of a drug injected into a rat.
d) The time it takes a student to walk to school
e) The number of liters of milk sold each day at a grocery store
3. Classify each as nominal-level, ordinal level, interval-level or ratio level.
a) Rating of movies as U, SX and LP.
b) Number of candy bars sold on a fund drive
c) Classification of automobile as subcompact, compact, standard and
luxury.
d) Temperatures of hair dryers.
e) Weights of suitcases on a commercial airline.
4. At Sintok Community College 150 students are randomly selected and asked the
distance of their house to campus. From this group, a mean of 5.2 km is
computed.
a. What is the parameter?
b. What is the statistic?
c. What is the population?
d. What is the sample?
Chapter 1: Introduction to Statistic
Tutorial
13
SQQS1013 Elementary Statistics
25
Matric. No: _______________________
Group:______
TUTORIAL CHAPTER 1
In the following multiple-choice questions, please circle the correct answer.
1.
You asked five of your classmates about their height. On the basis of this
information, you stated that the average height of all students in your university
or college is 65 inches. This is an example of:
a. descriptive statistics
b. statistical inference
c. parameter
d. population
2.
A company has developed a new computer sound card, but the average lifetime
is unknown. In order to estimate this average, 200 sound cards are randomly
selected from a large production line and tested and the average lifetime is found
to be 5 years. The 200 sound cards represent the:
a. parameter
b. statistic
c. sample
d. population
3.
A summary measure that is computed from a sample to describe a characteristic
of the population is called a
a. parameter
b. statistic
c. population
d. sample
4.
A summary measure that is computed from a population is called a
a. parameter
b. statistic
c. population
d. sample
5.
Data collected from a portion or a subset of all elements of interest in a statistical
study, are called ______ data.
a. sample
b. parameter
c. population
d. statistic
Chapter 1: Introduction to Statistic
Tutorial
14
SQQS1013 Elementary Statistics
6.
Which of the following is not the goal of descriptive statistics?
a. Summarizing data
b. Displaying aspects of the collected data
c. Reporting numerical findings
d. Estimating characteristics of the population
7.
Which of the following statements is not true?
a. One form of descriptive statistics uses graphical techniques
b. One form of descriptive statistics uses numerical techniques
c. In the language of statistics, population refers to a group of people
d. Statistical inference is used to draw conclusions or inferences about
characteristics of populations based on sample data
8.
Descriptive statistics deals with methods of:
a. organizing data
b. summarizing data
c. presenting data in a convenient and informative way
d. All of the above
9.
A politician who is running for the office of governor of a state with 4 million
registered voters commissions a survey. In the survey, 54% of the 5,000
registered voters interviewed say they plan to vote for her. The population of
interest is the:
a. 4 million registered voters in the state
b. 5,000 registered voters interviewed
c. 2,700 voters interviewed who plan to vote for her.
d. 2,300 voters interviewed who plan not to vote for her
10.
A company has developed a new battery, but the average lifetime is unknown. In
order to estimate this average, a sample of 500 batteries is tested and the average
lifetime of this sample is found to be 225 hours. The 225 hours is the value of a:
a. parameter
b. statistic
c. sample
d. population
11.
The process of using sample statistics to draw conclusions about true population
parameters is called
a. inferential statistics
b. the scientific method
c. sampling method
d. descriptive statistics
12.
Which of the following is most likely a population as opposed to a sample?
a. Respondents to a magazine survey
b. The first 10 students completing a final exam
c. Every fifth student to arrive at the book store on your campus
d. Registered voters in the State of Michigan
Chapter 1: Introduction to Statistic
Tutorial
15
SQQS1013 Elementary Statistics
13.
Researchers suspect that the average number of credits earned per semester by
college students is rising. A researcher at Michigan State University (MSU)
wished to estimate the number of credits earned by students during the fall
semester of 2003 at MSU. To do so, he randomly selects 500 student transcripts
and records the number of credits each student earned in the fall term 2003. He
found that the average number of semester credits completed was 14.85 credits
per student. The population of interest to the researcher is
a. all MSU students
b. all college students in Michigan
c. all MSU students enrolled in the fall semester of 2003
d. all college students in Michigan enrolled in the fall semester of 2003
14.
The collection and summarization of the graduate degrees and research areas of
interest of the faculty in the University of Michigan of a particular academic
institution is an example of
a. inferential statistics
b. descriptive statistics
c. a parameter
d. a statistic
15.
Those methods involving the collection, presentation, and characterization of a
set of data in order to properly describe the various features of that set of data are
called
a. inferential statistics
b. the scientific method
c. sampling method
d. descriptive statistics
16.
The estimation of the population average student expenditure on education
based on the sample average expenditure of 1,000 students is an example of
a. inferential statistics
b. descriptive statistics
c. a parameter
a. a statistic
17.
A study is under way in a national forest to determine the adult height of pine
trees. Specifically, the study is attempting to determine what factors aid a tree in
reaching heights greater than 50 feet tall. It is estimated that the forest contains
32,000 pine trees. The study involves collecting heights from 500 randomly
selected adult pine trees and analyzing the results. The sample in the study is
a. the 500 randomly selected adult pine trees
b. the 32,000 adult pine trees in the forest
c. all the adult pine trees taller than 50 feet
d. all pine trees, of any age in the forest
Chapter 1: Introduction to Statistic
Tutorial
16
SQQS1013 Elementary Statistics
18.
The classification of student major (accounting, economics, management,
marketing, other) is an example of
a. a categorical random variable.
b. a discrete random variable
c. a continuous random variable
d. a parameter.
19.
Most colleges admit students based on their achievements in a number of
different areas. The grade obtained in senior level English course (A, B, C, D, or
F) is an example of a ________________, or ________________ variable.
20.
For each of the following examples, identify the data type as nominal, ordinal, or
ratio.
a. The letter grades received by students in a computer science class
________________
b. The number of students in a statistics course
________________
c. The starting salaries of newly Ph.D. graduates from a statistics program
________________
d. The size of fries (small, medium, large) ordered by a sample of Burger King
customers. _____________________
e. The college you are enrolled in (Arts and science, Business, Education, etc.)
_________________
Chapter 1: Introduction to Statistic
Tutorial
17
Download