Uploaded by Dave N

BUS 215 Topic 1

advertisement
BUSINESS STATISTICS
(BUS 215)
TOPIC 1
DATA AND DATA COLLECTION
LECTURER: MR ELISA EBIYAMU
BUS 215 @ MCA
DATA AND DATA COLLECTION
BUS 215 @ MCA
Meaning of Data
• Data are observations that have been collected.
• Data are sometimes used to find statistics.
• A data set is an isolated list of lifeless numbers awaiting statistical
manipulation.
• It refers to all the data collected in a particular study.
• Data is collected from a population or a subset of the population called
a sample.
• When data is processed, organized, structured and presented in a given
context so as to make it useful, it is called information.
BUS 215 @ MCA
Population
• A population is the complete collection of all elements to be studied.
• The collection is complete in the sense that it includes all the subjects
to be studied.
• A census is a study that involves collection of data from every element
in a population.
BUS 215 @ MCA
Sample
• A sample is a sub collection of elements drawn from a population.
• It is a subset of a population.
Example
• A business produces three brands of a product sold to a population of
500 people in the neighborhood. The management hired consultant to
investigate consumer preference of each brand and determine the
brand that preferred by most customers. 65 customers were selected at
random for interviews as part of the research.
• In this example the 65 customers make up sample for the research.
• A survey is the collection of data from elements in a sample
BUS 215 @ MCA
Population and Sample
BUS 215 @ MCA
Meaning of Statistics
• Two common definitions of the word statistics are as follows:
• Statistics refer to facts or data, either numerical or nonnumerical, organized and summarized so as to provide useful
and accessible information about a particular subject.
• This definition is used with a plural verb.
• Statistics is a collection of methods for planning experiments,
obtaining data and then organising, summarizing, presenting,
analysing, interpreting and drawing conclusions based on
data.
• This is used with a singular verb.
BUS 215 @ MCA
Statistic and Parameter
• A statistic is a numerical measurement describing some
characteristics of a sample.
• A parameter is a numerical measurement describing some
characteristics of a population.
Example
• The average score of students in Business statistics at
MCA is 60 percent. A sample of 45 students was drawn
and its mean was 58 percent with a standard deviation of 3.
• In this example 60 is a parameter while 58 is a statistic.
BUS 215 @ MCA
CLASSIFICATION OF DATA
BUS 215 @ MCA
Classification of Data
• Data classification can be difficult; even statisticians occasionally
disagree over data type.
• In most cases, however, data classification is fairly clear and will help
you choose the correct statistical method for analyzing the data.
• The statistical method appropriate for summarizing data depends upon
the type of data.
BUS 215 @ MCA
Classification of Data
• Data may classified in several ways.
1. By measurement, we have qualitative data and quantitative data
2. By time of collection we have cross section and time series data
3. By source, we have primary and secondary data
4. By preciseness we have discrete and continuous data
5. By Number of variables we have univariate, bivariate and
multivariate data.
BUS 215 @ MCA
Qualitative and Quantitative Data
• Data can be classified by measurement into two groups
1. Categorical or qualitative data
2. Quantitative data
• Data that can be grouped by specific categories are referred to as
categorical data.
• Categorical data use either the nominal or ordinal scale of measurement.
• Data that use numeric values to indicate how much or how many are
referred to as quantitative data.
• Quantitative data are obtained using either the interval or ratio scale of
measurement.
• A categorical or qualitative variable is a variable with categorical data,
and a quantitative variable is a variable
with quantitative data.
BUS 215 @ MCA
Qualitative and Quantitative Data
• The statistical analysis appropriate for a particular variable depends
upon whether the variable is categorical or quantitative.
• If the variable is categorical, the statistical analysis is limited.
• We can summarize categorical data by counting the number of
observations in each category or by computing the proportion of the
observations in each category.
• However, even when the categorical data are identified by a numerical
code, arithmetic operations such as addition, subtraction, multiplication,
and division do not provide meaningful results.
BUS 215 @ MCA
Qualitative and Quantitative Data
• Arithmetic operations provide meaningful results for quantitative variables.
• For example, quantitative data may be added and then divided by the
number of observations to compute the average value.
• This average is usually meaningful and easily interpreted.
• In general, more alternatives for statistical analysis are possible when data
are quantitative.
BUS 215 @ MCA
Primary and Secondary Data
• Primary data is the name given to data that are used for the
specific purpose for which they were collected.
• They will not contain unknown quantities in respect of method of
collection, accuracy of measurement or number of respondents.
• Secondary data is the name given to data that are being used for
some purpose other than that for which they were originally
collected.
BUS 215 @ MCA
Advantages of Secondary Data
Secondary data is used when
• Time, manpower, resources necessary for the study
are not available.
• It already exists and provides most if not all of the
information required
Advantages of Secondary Data
• It saves time
• Saves manpower
• Saves resources
BUS 215 @ MCA
Disadvantages of Secondary Data
• Questionable data quality
• Data collected may now be out-of-date
• Geographical coverage of the data may not coincide with study
location
• Strata of the population covered may not be appropriate for purposes
of current study
• Some terms used may have different meanings.
BUS 215 @ MCA
Cross-Sectional and Time Series Data
• For purposes of statistical analysis, distinguishing between crosssectional data and time series data is important.
• Cross-sectional data are data collected at the same or approximately
the same point in time.
• Each respondent provides data on one or more variables
• Time series data are data collected over several time periods or
regular intervals.
• Each variable is observed at several points in time. For example daily,
weekly, monthly, annually and so on.
BUS 215 @ MCA
Cross sectional data
BUS 215 @ MCA
Time series data
BUS 215 @ MCA
Time series data
BUS 215 @ MCA
Discrete and Continuous Data
• Quantitative variables can be classified as either discrete or continuous.
• A discrete variable is a variable whose possible values can be listed, even
though the list may continue indefinitely.
• It refers to data that can be measured precisely.
• One way of obtaining discrete data is by counting, for example.
i. Number of products that a firm produces
ii. Number of employees working at a firm.
• Discrete data can also be obtained from non counting situation, for example
i. Shoe sizes of a sample of students
ii. Weekly wages of a set of workers
BUS 215 @ MCA
Discrete and Continuous Data
• A continuous variable is a variable whose possible values form some
interval of numbers.
• It is a variable whose values can not be measured precisely but can
only be approximated.
• Typically, a continuous variable involves a measurement of something,
such as;
• The height of a person,
• The weight of a newborn baby,
• The length of time a car battery lasts.
BUS 215 @ MCA
Scales of Measurement
• Data collection requires one of the following scales of measurement:
i. nominal,
ii. ordinal,
iii. interval,
iv. ratio.
• The scale of measurement determines the amount of information
contained in the data and indicates the most appropriate data
summarization and statistical analyses.
BUS 215 @ MCA
Scales of Measurement
• Nominal scale The scale of measurement for a variable when the data are
labels or names used to identify an attribute of an element.
• Nominal data may be nonnumeric or numeric but we can not perform any
mathematical operation on nominal data.
• For example, to facilitate data collection and to prepare the data for entry
into a computer database, we might use a numeric code by letting 1 denote
BBME, 2 denote BMPR, and 3 denote BIAAS.
• In this case the numeric values 1, 2, and 3 identify the category of
programmes at MCA.
• The scale of measurement is nominal even though the data appear as
numeric values.
BUS 215 @ MCA
Scales of Measurement
• Ordinal scale The scale of measurement for a variable if the data exhibit the
properties of nominal data and the order or rank of the data is meaningful.
• Ordinal data may be nonnumeric or numeric.
• We can rank the data.
• For example, Eastside Automotive sends customers a questionnaire designed
to obtain data on the quality of its automotive repair service.
• Each customer provides a repair service rating of excellent, good, or poor.
• Because the data obtained are the labels—excellent, good, or poor—the data
have the properties of nominal data. In addition, the data can be ranked, or
ordered, with respect to the service quality
BUS 215 @ MCA
Scales of Measurement
• Interval scale The scale of measurement for a variable if the data demonstrate
the properties of ordinal data and the interval between values is expressed in
terms of a fixed unit of measure.
• Interval data are always numeric but zero is not meaningful.
• We can add or subtract but we can not multiply or divide.
• Scholastic Aptitude Test (SAT) scores are an example of interval-scaled data.
For example, three students with SAT math scores of 620, 550, and 470 can
be ranked or ordered in terms of best performance to poorest performance.
• In addition, the differences between the scores are meaningful.
BUS 215 @ MCA
Scales of Measurement
• Ratio scale The scale of measurement for a variable if the data
demonstrate all the properties of interval data and the ratio of two
values is meaningful.
• Hence we can perform addition, subtraction, multiplication and
division.
• Ratio data are always numeric.
• Variables such as distance, height, weight, and time use the ratio scale
of measurement.
• This scale requires that a zero value be included to indicate that
nothing exists for the variable at the zero point.
BUS 215 @ MCA
SOURCES OF DATA AND DATA
COLLECTION TECHNIQUES
BUS 215 @ MCA
Data Sources
• Data can be obtained from
1. Existing or secondary data sources
2. Studies or surveys
BUS 215 @ MCA
Existing Sources
• In some cases, data needed for a particular application already exist.
• In Malawi, the National Statistical Office (NSO) in a government
department created to conduct surveys and census. As such they have a
lot of data that can be used in research.
• These existing sources may upload the data on their websites.
• Secondary data sources fall broadly into two categories
1. Internal sources
2. External sources
BUS 215 @ MCA
Internal Existing Data Sources
BUS 215 @ MCA
External Existing Data Sources
BUS 215 @ MCA
Statistical Studies
• Sometimes the data needed for a particular application are not available
through existing sources.
• In such cases, the data can often be obtained by conducting a statistical study.
• Statistical studies can be classified as either experimental or observational.
• In an experimental study, a variable of interest is first identified and then one
or more other variables are identified and controlled so that data can be
obtained about how they influence the variable of interest.
• For example, a pharmaceutical firm might be interested in conducting an
experiment to learn about how a new drug affects blood pressure.
• Blood pressure is the variable of interest in the study.
BUS 215 @ MCA
Statistical Studies
• Non-experimental, or observational, statistical studies make no attempt to
control the variables of interest.
• A survey is perhaps the most common type of observational study.
• For instance, in a personal interview survey, research questions are first
identified.
• Then a questionnaire is designed and administered to a sample of
individuals.
BUS 215 @ MCA
DATA COLLECTION TECHNIQUES
• Data collection can be thought of as the means by which information is
obtained from the selected subjects of an investigation.
• There are several methods of data collection and sometimes a samping
technique will detect which method is used and in other cases there will
be a choice.
• Some of the common data collection techniques are;
i. Individual (personal) interviews
ii. Postal questionnaires
iii. Street interviews
iv. Telephone interviews
v. Direct observation
BUS 215 @ MCA
Individual (Personal) Interviews
• Individual interviews are usually used with random sampling
It has an advantage of completeness and accuracy.
• Questions can be thoroughly tested
• Uniformity of approach if only one interviewer is used
• Follow up question can be put where the question has not be addressed
thoroughly.
However,
• This method is very expensive
• Interviewers need to be trained
• Interviews need arranging
BUS 215 @ MCA
Postal Questionnaire
• Postal questionnaires can be used with many sampling methods.
• This is a much cheaper method than individual interviews
• Questions should be easy to understand.
• Low response rate
• No need for prior arrangement
• Posted questionnaires may be filled by wrong persons
BUS 215 @ MCA
Street Interviews
• This method of data collection is normally used in conjunction
with quota sampling, where the interviewer is often just one of a
team.
• Some factors involved are;
i. Possible differences in interviewer approach to the
respondents and the way replies are recorded.
ii. Questions must be short and simple
iii. Non-response is not a problem normally, since refusals are
ignored and another subject selected.
iv. Convenient and cheap
BUS 215 @ MCA
Telephone Interview
• This method is sometimes used in conjunction with a systematic
sample.
• It would generally be used within a local area and is often connected
with selling a product or a service.
• It has an in-built bias if private homes are being telephoned (rather
than business), since only those people with telephones can be
contacted and interviewed.
• It can cause aggravation and the interviewer needs to be very skilled.
BUS 215 @ MCA
Direct Observation
• This method can be used to examine items sample from a production
line, in traffic surveys or in work study.
• It is normally considered to be the most accurate form of data
collection, but is very labour intensive and can not be used on many
situations.
BUS 215 @ MCA
SAMPLING TECHENIQUES
BUS 215 @ MCA
Sampling
• You will recall that a sample is a subset of a study population.
• The research need to get a sample that is enough and representative of the
population. This process is called sampling.
• A Sampling Frame is a listing of the elements the sample will be selected
from.
• For some populations a sampling frame is not known such that an
investigation would be required before a sample is taken.
• A sample is representative if it has characteristics that are as close to the
population as possible.
BUS 215 @ MCA
Sampling Techniques
Sampling techniques can be put in three categories.
1. Random sampling
i. Simple random sampling
ii. Stratified random sampling
2. Quasi-random sampling
i. Systematic sampling
ii. Multi-stage sampling
3. Non-random sampling
i. Cluster sampling
ii. Quota sampling
BUS 215 @ MCA
Simple Random Sampling
• Simple random sampling is a sampling procedure for which each
possible sample of a given size is equally likely to be the one obtained.
• We can also say that it is a sampling technique in which each member
of the population has an equal chance of being selected.
• Simple random sampling is used when a small proportion of the
population is to be taken as a sample.
• A random sample can be drawn using random numbers
BUS 215 @ MCA
Simple Random Sampling
Advantages
• Selection of elements is unbiased
• It is a fair method
Disadvantages
• Needs population listing
• The chosen elements might be so geographically dispersed that the
cost of interviewing becomes too hire
• The chance that certain attributes of the population may be over or
under represented.
BUS 215 @ MCA
Stratified Random Sampling
• Stratification of a population is a process which identifies certain
attributes(Strata levels) that are considered significant to the
investigation at hand and partition the population accordingly into
groups based of the strata levels.
• For example, to study factors that affect performance of students at
MCA a researcher would start by putting students into groups based on
mode of study (Fulltime, evening and weekend) then select elements
randomly from each group (Stratum).
• A population like that of MCA students is said to be heterogeneous.
BUS 215 @ MCA
Stratified Random Sampling
• In stratified sampling the population is first divided into subpopulations,
called strata, and then sampling is done from each stratum.
• Ideally, the members of each stratum should be homogeneous relative to the
characteristic under consideration.
Procedure for Stratified Random Sampling
Step 1: Divide the population into subpopulations (strata).
Step 2: From each stratum, obtain a simple random sample.
Step 3: Use all the members obtained in Step 2 as the sample.
BUS 215 @ MCA
Stratified Random Sampling
Advantage
• The sample is free from bias
Disadvantages
• Need an extensive sampling frame
• Strata levels are selected subjectively
• Costly, time consuming and need more resources to organize and
implement the sample.
BUS 215 @ MCA
Stratified Random Sampling
• In stratified sampling, the strata are often sampled in proportion to
their size, which is called proportional allocation.
• Given that a researcher would like to draw a sample of 20 elements
from a population of 250 homeowners of which 25 are upper income,
175 are middle income, and 50 are lower income.
• The sample size for the upper-income homeowners is, therefore,
BUS 215 @ MCA
Systematic Sampling
• Systematic sampling is a sampling technique in which a starting point
is chosen randomly and the selecting every nth element from the
population.
• It is used when the listed ( such as invoice values or a company’s fleet
of vehicles) or some of it is physically in evidence (such as rows of
houses, items coming out of production line).
• This technique is particularly useful when the populations which have
identical elements (homogeneous populations).
BUS 215 @ MCA
Systematic Sampling
Advantages
• Easy to use
• Can be used even where a sampling frame is not available
Disadvantages
• Bias can occur if recurring sets in the population are possible.
BUS 215 @ MCA
Multi-Stage Sampling
Multi-Stage Sampling involves the following;
• Split the area up into a number of regions
• Randomly select a small number of these regions
• Confining sub-samples to these selected regions, with the size of
each subsample proportional to the size of the area or population of
the area.
• The procedure can be repeated for sub-regions within regions
Once the final region (or sub-regions) have been selected, the
final sampling technique could be randon or systematic.
BUS 215 @ MCA
Multi-Stage Sampling
Advantages
• Need less time
• Need less manpower
• Cheaper
Disadvantages
• Possible bias if a very small number of regions is selected
• The method is not random
BUS 215 @ MCA
Cluster Sampling
• Cluster sampling is a non-random sampling method which can be
employed where no sampling frame exists and for a population that is
distributed over a geographical area.
• The techniques involves;
• Selecting one or more geographical areas
• Sampling all the members of the targeted population that can be
identified
BUS 215 @ MCA
Cluster Sampling
Advantages
• It is a good alternative to multi-stage sampling where no sampling
frame exists
• It is generally cheaper that other methods since little organisation is
needed
Disadvantages
• Selecting bias could be significant because it is not random.
BUS 215 @ MCA
Quota Sampling
• Quota sampling uses a team of interviewers, each with a set
number(quota) of subjects to interview.
• Normally the population is stratified in some way and the interviewers
quota will reflect this.
• The method places a huge responsibility on the interviewers since
selection of subjects is left to them entirely.
• As such, the interviewers must be well trained and have a responsible
professional attitude.
BUS 215 @ MCA
Quota Sampling
• Advantages
• Stratification of the population is usual
• No non-response
• Low cost and convenient
• Disadvantages
• Sampling is non-random and therefore it is subject to bias.
BUS 215 @ MCA
BIAS AND VALIDITY
BUS 215 @ MCA
What is bias?
• Bias can be defined as the tendency of a pattern of errors to influence
data in an unrepresentative way.
• More generally, a statistic is called an unbiased estimator of a
parameter if the mean of all its possible values equals the parameter;
otherwise, the statistic is called a biased estimator of the parameter.
• Ideally, we want our statistic to be unbiased and have small standard
error.
• For, then, chances are good that our point estimate (the value of the
statistic) will be close to the parameter.
BUS 215 @ MCA
Types of Bias
1. Selection bias
• This occurs when the sample is not truly a representative of the population.
• For example, selecting residents only from Area 10 in Lilongwe for
purposes of estimating average income of all Malawians amount can lead
to selection bias
2. Structure and working bias
• It results from badly worded questions.
3. Interview bias
• This occurs when the interviewer project a biased opinion or attitude that
may not gain full cooperation of the subject.
4. Recording bias
• This could result from badly worded responses or clerical errors made by
untrained workforce.
BUS 215 @ MCA
Validity
• Validity of data refers to how well the data measures what they are supposed
to measure.
• Validity should not be confused with reliability.
• Reliability of data refers to the consistency with which the results occur.
• Before analyzing a data set, statisticians usually make a variety of checks to
ensure the validity of data.
• In a large study it is not uncommon for errors to be made in recording data
values or in entering the values into a computer.
• Identifying outliers is one tool used to check the validity of the data.
BUS 215 @ MCA
THANK YOU
BUS 215 @ MCA
Download