Uploaded by Niah Villanueva

MGSC301

advertisement
Lesson 1
Objective 1: List examples in business and economics in which various
statistical methods are applied.
Statistics - The science concerned with developing and studying methods,
for collecting, analyzing, interpreting and presenting empirical data and
numerical facts.
•
•
•
Statistics refers to numerical facts such as averages, medians,
percentages, and maximums that helps us understand a variety of
business and economic situations.
Statistics is the art and science of collecting, analyzing, presenting,
and interpreting data.
The datas collected, analyzed and presented are often shown to decline
makers or policymakers.
Applications in Business and Economics
Global business and economic environment can access vast amount of
statistical information.
• Accounting - Accounting firms use statistical sampling procedures
when conducting audits for their clients. Auditors draws a conclusion
by reviewing the data sample in the clients accounts and will decide
whether the accounts receivable shown on the Client’s balance sheet is
acceptable.
• Finance - Financial analyst use a variety of statistical information to
guide their investment recommendations. Analysts review financial
data such as price/earnings ratios and dividends yield. The data
collected in regards to the stock market then would give the analyst an
idea whether it is a good business for investment.
• Marketing - Electronic scanners at retail checkout counters collect data
for a variety of marketing research h applications. Manufacturers will
spend thousands of dollars per product category to collect the
information from this scanner. This is to give a better understanding
of consumer activities and what brings consumer in to the market.
• Production - A variety of statistical quality control charts are used
To monitor the output of production process.
• Economics - Economists frequently provide forecasts about the future
of the economy or some aspects of it. They used the data collected
which are statistical informations to create such forecast.
• Information Systems - Information systems administrators are
responsible for the day to day operation of an organization’s
computer networks. Statistical information helps administrators
asses the performanc of computer networks, including the local area
networks (LANs), wide area networks (WANs), network segments,
intranets, and other data communication systems.
Applications of statistics and those described in this section are integral
part.. Statistics in practice applications shops the importance of statistics
in wide variety of business and economic situations.
Objective 2: Identify the elements, variables, and observations of the
dataset for given set of data.
Data - The facts and figures collected, analyzed, and summarized for
presentation and interpretation.
Data set - Data collected in particular study.
Elements - Elements are the entities on which data are collected.
Variable - A characteristics of interest for the element.
Measurements collected on each variable for every element in a study
provide the data.
Objective 3: Identify whether a variable is quantitative or categorical.
Scales of Measurement:
• Nominal
• Ordinal
• Interval
• Ratio
Nominal Scale - The data for a variable consist of labels or names used to
identify an attribute of the element.
Ordinal Scale - The data exhibit the properties of nominal scale and the order
of rank of the data is meaningful.
• Rating letters provide the similar to nominal data, but in addition, the
data can also be ranked or ordered based on the credit rating.
Interval Scale - The data have all the properties of ordinal data and the
interval between the values is expressed in terms of fixed unit
measurements.
• always numerical
Ratio Scale - The data have all the properties of interval scale and the ratio of
two is meaningful. Variables such as distance, height, weight, and time use
the ratio scale of measurement.
• this scale requires that a zero value be included to indicate that nothing
exists for the variable at the zero point.
-
-
-
-
-
-
-
-
-
-
-
- -
- - - -
-
-
-
-
-
-
-
#
-
-
#
-
Categorical and Quantitative Data
Categorical data - Data that can be grouped by specific categories.
• This type of data use either the nominal or ordinal scale of
measurements.
Quantitative Data - Data that uses numeric values to indicate how much or
how many.
• This type of data is obtained using either interval or action scale of
measurement.
Categorical Variable - A variable with categorical data
Quantitative Variable - A variable with quantitative data
If variable is categorical the statistics analysis is limited. This data can be
summarized by counting the number of observations in each category or by
computing the proportion of the observations in each category. Although
numerical codes are provided arithmetic method will not be able to apply
such as addition, subtraction, multiplication or division.
-
-
-
-
-
-
-
-
-
-
-
-
Cross-Sectional and Time Series Data
Cross Sectional Data - Data collected at the same approximately the same
point in time.
·
Time Series Data - Data collected over several time periods.
Graphs of time series data are frequently found in business and economic
publications. This graphs helps analyst understand what happened in the
past, identify the trends over time, and project future values for time series.
Data Sources
Data needed for particular application already exist. Often times companies
maintain a variety of databases about their employees, customers, and their
business operations.
• companies will collect and maintain detailed data about their customers
and employees.
>
• Organizations that specializes in collecting and maintaining data
make available substantial amounts of business and economic data.
• Companies are able to access data from these organization through
purchasing a leasing access for these information.
=
F
Objective 4: Describe descriptive statistics and statistical interference.
Descriptive Statistics - Summaries of data, which may be tabular,
graphical, or numerical.
• Statistical information in the media, company reports, and other
publications consists of data that a re summarized and presented in a
form that is easy for the reader to understand.
• Numerical Descriptive statistics are used to summarize data’s.
• The most common numerical measure is the average, or mean.
Statistical Interference - The process of drawing conclusions about an
underlying population based on a sample or a subset of the data.
-
-
-
-
Population - A population is the set of all elements of interest in particular
study.
Sample - A sample is the subset of population.
Census - The process of conducting a survey to collect data for the entire
population.
Sample Survey - The process of conducting a survey to collect data for a
sample
•
•
As one of its major contributions, statistics uses data from a sample to
make estimates and test hypotheses about the characteristics of a
population through a process referred to as statistical interference.
Statisticians use as a sample toy estimate a population characteristics
of interest they usually provide a statement of the quality.
Objective 5: Identify the population for a given study and the contrast
population and sample.
A population is the entire group that you want to draw conclusions about. A
sample is the specific group that you will collect data from. The size of
sample is always less than the total size of the population. I research
population doesn’t always reference to people.
- - -
Objective 6: Describe the role of random samples in statistical interference.
The use of randomization in sampling allows for that analysis of results
using the methods of statistical interference.
• Random sampling ensures that results obtained from your sample
- should approximately what would have been obtained if the entire
population had been measured.
Objective 7: Describe the importance of big data and data mining.
Big Data - The large, diverse sets of information that grow at everincreasing rates.
• It encompasses the volume of information, the velocity or speed at which
it is created and collected, and the variety or scope of the data points
being covered (known as the “three v’s of big data”)
•
Big data analytics helps organizations harness their data an use it to
identify new opportunities.
• This helps leads to smarter business moves, more efficient operations,
higher profits and happier customers.
Data Mining - The process of finding anomalies, patterns and correlations
within large data sets to predict outcomes.
• The application of statistics in the form of exploratory data analysis ad
predictive models to reveal patters and trends in very large data sets.
•
•
•
Statistics form the core portion of data mining which covers the entire
process of data analysis.
Statistics help in identifying patterns that further help identify
differences between random noise and significant findings.
Data mining is key to sentimental analysis, price optimization,
database markleti9ng, credit risk management, training and support,
fraud detection, healthcare and medical diagnoses, risk assessment,
recommendation systems.
Objective 8: Identify the ethical guidelines for statistical practice.
Ethical behaviour is something we should strive in all what we do. Ethical
issues arise in statistics because the of the important role statistics play in
the collection, analysis, presentation, and interpretation of data.
• When performing your own statistical work, it is encouraged to be fair,
thorough, objective, and neutral as you collect data, conduct analyses,
make oral presentation,. And present within reports containing
information developed.
• As a consumer of statistics, you should also be aware of possibility of
unethical statistical behaviours by others.
• When you see statistics in the media ensure too check the sources on
where, how and when is the data collected.
• Americans Statistical Association - The nation’s leading professional
organization for statistics and statisticians, developed the report for
“ Ethical Guidelines for Statistical Practice” to help statistical
practitioners make and communicate ethical decisions and assist
students in learning how to perform statistical work responsibly.
• The report contains 52 guidelines organized into eight topic areas:
professional Integrity and Accountability; Integrity of Data and
Methods; Responsibilities to Science/Public/Funder/
Client;Responsibilities to Research Subjects; Responsibilities to Research
Team Colleagues; Responsibilities to Other Statisticians or Statisticians
Practitioners; Responsibilities Regarding Allegations of misconduct;
and Responsibilities of Employers Including Organizations,
Individuals,. Attorney, or Other Clints Employing Statistical
Practitioners.
Lesson 2: Descriptive Statistics: Tabular and Graphical Displays
Objective 1: Determine the frequency distribution and the relative frequency distribution
for a categorical variable.
Categorical Data - Use labels or names to identify categories of like items.
Quantitative Data - Numerical values that indicate how much or how many.
Tabular or Graphical Displays - Used to summarize data for a single variable.
Data Visualization - A term often used to describe the use of graphical displays to
summarize and present information about a data set.
Summarizing Data for a Categorical Variable
Frequency Distribution - A tabular
summary of data showing
the number (frequency)
.
.
.
of observation in each of several non overlapping categories or classes.
-
Frequency Distribution of Soft Drink Purchase
Soft Drink
Coca-Cola
Diet Coke
Dr.Pepper
Pepsi
Sprite
Total
Frequency
19
8
5
13
5
50
Relative Frequency and Percent Frequency Distribution
A frequency distribution shows the number (frequency) of observations in each of several
non overlapping classes.
The relative frequency of a class equals the fraction or proportion of observation belonging
to a class.
In a data set n = observations
Relative frequency of a class =
Frequency of the class
N
The percent frequency of a class is the given relative frequency multiplied by 100.
Relative Frequency Distribution - It gives a tabular summary of data showing the relative
frequency for each class.
Percent Frequency Distribution - It summarizes the percent frequency of the data for each
class.
Relative Frequency and Percentage Frequency Distributions of Soft Drink Purchases
Soft Drink
Coca-Cola
Diet Coke
Dr.Pepper
Pepsi
Sprite
Total
Relative Frequency
.38
.16
.10
.26
.10
1.00
Percent Frequency
38
16
10
26
10
100
To get relative
frequency divide
number by total
Objective 2: Draw a bar chart and pie chart, and provide basic interpretations of them.
Bar Chart - A graphical display for depicting categorical data summarized in a
frequency, relative frequency distribution.
• One one axis of the chart(usually the horizontal axis),. We specify the labels that are
used for classes (categories).
• A frequency, relative frequency, or percent frequency scale can be used for the other
axis of the chart (usually the vertical axis)
Pie Chart - A graphical display for presenting relative frequency and percent frequency
distributions for categorical data.
Additional Notes
1. Often the number of classes in a frequency distribution is the same number as the
categories found in data.
2. Most statisticians recommend that classes with smaller frequencies be grouped into an
aggregate class called “other”.
3. The sum of frequencies in any frequency distribution always equals the number of
observations.
4. The sum of the relative frequencies in any relative frequency distribution always
equals 1.00, and the sum of the percentage in a percentage frequency distribution
always equals 100.
Objective 3: Describe the three steps in defining class intervals for a frequency
distribution for a quantitative variable.
A frequency distribution is a tabular summary of data showing the number (frequency)
of observations in each of several no overlapping categories or classes.
Three necessary steps to define the classes for a frequency distribution with quantitative
data are:
1. Determine the number of non overlapping classes
2. Determine the width of each classes
3. Determine the class limits
Number of Classes - Classes are formed by specifying ranges that will be used to group
the data.
-The goal is to use enough classes to show a variation of data.
Width of the Classes - The second in constructing frequency distribution for
quantitative data is to choose width for the classes.
-Same width for each classes is recommended
-Larger number of classes means smiler width of classes and vice versa.
-To determine an approximate class width, we begin by identifying the largest and
smallest data Values. Then, with the desired number of classes specified, we can use the
following expression to determine the approximate class width.
Approximate Class Width =
Largest Data Value - Smallest Data Value
Number of Classes
Class Limits - Class limits must be chosen so that each data item belongs to one and only
one class.
-The lower class limit identifies the smallest possible data value assigned to the class.
The upper class limit identifies the largest possible data value assigned to the class.
Objective 4: Define cumulative frequency and cumulative relative frequency distribution,
and provide basic interpretations of them.
We define relative frequency and percent frequency distribution for quantitative data in
the same manners as for categorical data.
First recall the relative frequency is the proportion of the observations belonging to a class.
N= O bservations
Frequency of the Class
Relative Frequency of a Class =
N
The percent frequency of a class is the relative frequency multiplied by 100.
Example: Relative Frequency is .40 the percent frequency would be 40%.
Objective 5: Draw a histogram for a given frequency distribution or relative frequency
distribution, and provide a basic interpretation of it.
Dot Plot - One of the sample graphical summaries of data is a dot plot .
-A horizontal axis shows the range for the data. Each data value represented by a dot
placed above the axis.
-dot plot shows the details of the data and are useful for comparing distributions of the
data’s for the two out more variables.
Histogram - A common graphical display of quantitative data is histogram. The
graphical display can be prepared for data previously summarized either a frequency,
relative frequency, or percent frequency distribution.
-A histogram is constructed placing the variable of interest on the horizontal axis and the
frequency , relative frequency, or percent frequency on the vertical axis.
-The frequency, relative frequency, or percent frequency of each class is shown by
drawing a rectangle whose base is determined by the class limits on the horizontal axis a
and whose height corresponding frequency, relative frequency, or percent frequency.
Differing Levels of Skewness
Panel A: A histogram is said
to be skewed to the left if its
tail extends farther to the left.
Panel B: A histogram is said
to be skewed to the right if its
tail extends to the right.
Panel C: A symmetric
histogram, the left tail
mirrors the shape of the right
tail.
Panel D: Data from
applications in business and
economics often lead to the
histograms that are skewed to
the right.
Example: Housing prices,
salaries, purchases amounts.
Objective 6: Describe how stem-and-leaf display is constructed, and list some of the
advantages and disadvantages of the display as compared to a histogram.
Cumulative Distributions
A variation of frequency distribution that provides another tabular summary of
quantitative data is the cumulative frequency distribution.
The cumulative frequency distribution uses the number of classes, class widths, and
class limits developed for the frequency distribution. Rather than showing frequency of
each class, the cumulative frequency of distribution shows the number of data items with
values less that or equal to the upper class limit of each class.
-
Stem-and-Leaf Display
A stem-and-leaf display is a graphical display used to show simultaneously the rank
order and shape of a distribution of a data.
-To illustrate the use of a stem-and-leaf display,
Objective 7: Explain the values of scatter diagrams
Scatter Diagrams and Trend line
Scatter Diagram - A graphical display of the relationship between two quantitative
variables.
Trtendline - A line that provides and approximation of the relationship.
Side-by-Side and Stacked Bar Charts
Side-by-side bar chart is a graphical display for depicting multiple bar charts on the
same display.
Additional Notes
1. Time series is a sequence of observation on a variable measured at a successive points
in time or over successive periods of time.
2. A scatter diagram in which the value of time is shown on the horizontal axis and the
time series values shown on the vertical axis is referred to in time series analysis as a
time series pot.
3. A stacked bar chart can also be used to display frequencies rather that percentage
frequencies.
Download