Uploaded by Alisha Mohamed

Introduction to Statistics and Data

advertisement
Notes by Gabriel Okello
INTRODUCTION TO STATISTICS AND DATA
Description
This lesson will introduce you to the basic concepts of statistics and data. This lesson will also
introduce you to different statistical softwares for data analysis. You will also be introduced to
SPSS statistical software for data analysis
Learning Outcomes
At the end of this lesson students will be able to:
1. Understand different statistical concepts:
2. Become aware of a wide range of applications of statistics in business
3. Identify the different types and sources of data
4. Compare the four different levels of data: nominal, ordinal, interval, and ratio
5. Distinguish different statistical softwares
Lesson Outline
1. Basic statistical concepts
2. Application of statistics
3. Types, sources and classification of data
4. Data measurements
5. Introduction to statistical softwares
Basic Statistical Concepts [Refer to Gupta & Kapoor (2000) Chapter 1]
Statistics is the art and science of collecting, analyzing, presenting and interpreting data which
leads to drawing of conclusions.
Statistics has two branches: descriptive statistics and inferential statistics
Descriptive statistics: part of statistics concerned with the description and summarization of data
It also refers to the presentation of a body of data in the form of tables, charts, graphs, and other
forms of graphic display together with the measures.
Notes by Gabriel Okello
Inferential statistics: part of statistics concerned with drawing of conclusions about the properties
of the whole population from sample
Statistical inference: process of using data from the sample to make estimates and test hypothesis
about the characteristic of the population.
Inferential statistics has two branches: parametric statistics and nonparametric statistics
Parametric statistics: requires certain assumptions about the distribution of the data. It requires
interval or ratio data.
Nonparametric statistics: there is no assumptions about the distribution of the data. It requires
data that are in nominal and ordinal
Population is the total collection of elements of interest in a particular study
Sample is the sub-group of the population
Parameter is a numerical measure describing a characteristics of the population
Statistic is a numerical measure describing a characteristics of a sample
A statistic is usually used to estimate the parameter. Parameter is the truth we desire to find out
Application of statistics [Refer to Gupta & Kapoor (2000) Chapter 1]
Statistics is used today practically by every profession
– Economist uses it to test the efficiency of alternative production techniques;
– Businessperson may use it to test the product design or package that maximizes sales;
– Sociologist use it to analyze the result of a drug rehabilitation program;
– Industrial psychologist use it to examine workers' responses to plant environment;
– Political scientist use statistics to forecast voting patterns;
– The physician to test the effectiveness of a new drug; the chemist to produce cheaper
fertilizers
– Finance analysts use it to guide their investments recommendations
Some of essential functions of statistics
– Statistics presents facts and figures in a definite form
– Statistics simplifies the complexity of data
– Facilitates comparison between different sets of observation
– Formulating and testing of hypothesis
Notes by Gabriel Okello
– Helps in formulating plans and policies in different fields
– Deriving valid inferences
– For forecasting of events and trends
Data
Data are facts and figures collected, analyzed and summarized for presentation and interpretation.
Data can also be defined as the values of a variable.
– Elements are entities on which data is collected.
– A variable is a characteristic of interest for the element. A variable can also be defined as
a characteristic that varies from one person or thing to another.
Data can be classified by
 Scales of measurement
 Either as qualitative or quantitative
 Either as cross-sectional or time-series
 Either as primary or secondary
Scales of measurement/ Levels of data measurement
There are four measurement scales: nominal, ordinal, interval and ratio (NOIR).
Ratio Scale
Ratio-level data measurement is the highest level of data measurement. Ratio data have the
same properties as interval data, but ratio data have an absolute zero, and the ratio of two numbers
is meaningful. Examples of ratio data are height, weight, time, volume, and Kelvin temperature.
With ratio data, a researcher can state that 180 pounds of weight is twice as much as 90 pounds or,
in other words, make a ratio of 180:90. Many of the data gathered by machines in industry are ratio
data.
Interval scale
In an interval scale, the numerals assigned to each measure are ranked in order and the intervals
between numerals are equal. Interval-level data measurement is the next to the highest level of
Notes by Gabriel Okello
data in which the distances between consecutive numbers have meaning and the data are always
numerical. The distances represented by the differences between consecutive numbers are equal;
that is, interval data have equal intervals. An example of interval measurement is Fahrenheit
temperature. With Fahrenheit temperature numbers, the temperatures can be ranked, and the
amounts of heat between consecutive readings, such as 200, 210, and 220, are the same.
Ordinal Scale
Ordinal-level data measurement is higher than the nominal level. In addition to the nominal level
capabilities, ordinal-level measurement can be used to rank or order objects. Examples of variables
that can be measured at the ordinal scale include: social class, military rank, teaching staff, etc.
Nominal Scale
A nominal scale is considered to be the lowest level of measurement of a variable. This type of
scale merely groups subjects, units or cases from the sample into categories. Subjects or cases in
each category have some common set of characteristics. Numbers representing nominal level data
(the word level often is omitted) can be used only to classify or categorize Variables which can
only be measured at the nominal scale include: sex, race, marital status, employment status,
language, roofing materials, religion, shape of a building, colour, etc.
Qualitative and Quantitative data
Data can either be qualitative or quantitative
Quantitative data are those data sets that are numerical in nature
Qualitative data are those data sets that are non-numerical in nature
Cross-sectional and time series data
Data can be classified as either cross-sectional or time-series especially when analyzing data.
Cross sectional data are data collected at the same or approximately the same point in time.
Time series data are data sets collected sequentially over time
Data sources
Notes by Gabriel Okello
Data can be obtained from existing sources (secondary) or for the first time (primary) using various
methods.
Data collection methods
–
Questionnaires
–
Experiments
–
Interviews
–
Observation check lists
Organizing Data
Some situations generate an overwhelming amount of data which will require presentation.
Data can be presented using
– Tables (frequency distribution)
– Graphs/ charts
– Numerical measures
The type of table and graph depends on the type of data – qualitative (non-numerical) or
quantitative (numerical).
Summarizing Qualitative data

Tabular methods – frequency, relative frequency and percentage distributions

Graphical methods – bar charts and pie charts
Summarizing Quantitative data

Tabular methods – frequency, relative frequency and percentage distributions; cumulative
frequency, relative cumulative frequency and percentage cumulative frequency
distributions

Graphical methods – line graphs, dot plots, stem and leaf plots, histograms, boxplots,
scatterplots and ogives
Notes by Gabriel Okello
Understanding symbols used in the formulas
[Refer to Gupta & Kapoor (2000) Chapter 1 for formulas used in the Chapters]
Introduction to Statistical Softwares
Statistical software is a computer program designed to conduct statistical analyses.
There are several statistical softwares. The common statistical softwares are:
 SPSS
 STATA
 SAS
 Eviews
 Matlab
 Minitab
 R Statistical Computing
 Excel
 NVivo
The choice of statistical softwares depends on:
 The type of data
 The type of analyses to be done
 Cost
 Institutional preferences
 Learning curve
Further Reading

Read Gupta & Kapoor (2000) Chapter 1,

Read Wonnacott & Wonnacott (1990) Chapter 1,

Watch video clips on Introduction to Statistics, Data and Statistical Softwares
Notes by Gabriel Okello
–
Basic Concepts and Terminology
https://www.youtube.com/watch?v=zlfwdsEDC4Q
–
Choosing appropriate statistical software:
https://www.youtube.com/watch?v=musm3zmPThI
Week 1 Activity
1. With examples, distinguish between variable and data
2. Describe the two major branches of statistics with examples.
3. Using examples, describe the following levels of data measurements
a. Ordinal
b. Ratio
c. Interval
d. Nominal
4. What are the limitations of statistics
Download