Uploaded by QUARANTINE TV

ITS-Week 1 Autosaved (1)

advertisement
Faculty Name: Dr. M. Massarrat Ali Khan
Course Name: Introduction to Statistics
Email: mokhan@iba.edu.pk
Week 1
Business Analytics:
• What is business analytics?
• Business analytics (BA) is a set of disciplines and technologies for
solving business problems using data analysis, statistical models and
other quantitative methods. It involves an iterative, methodical
exploration of an organization's data, with an emphasis on statistical
analysis, to drive decision-making.
• Business analytics is the scientific process of transforming data into
insight for making better decisions.
• Business analytics is used for data-driven or fact-based decision
making, which is often seen as more objective than other alternatives
for decision making
• Types of business analytics
• Different types of business analytics include the following:
• descriptive analytics, which tracks key performance indicators (KPIs)
to understand the present state of a business;
• predictive analytics, which analyzes trend data to assess the
likelihood of future outcomes; and
• prescriptive analytics, which uses past performance to generate
recommendations for handling similar situations in the future.
Descriptive Analytics
• Descriptive analytics encompasses the set of techniques that
describes what has happened inthe past. Examples are data queries,
reports, descriptive statistics, data visualization including data
dashboards, some data-mining techniques, and basic what-if
spreadsheet models.
>A data query is a request for information with certain characteristics
from a database.
>Data dashboards are collections of tables, charts, maps, and summary
statistics that are updated as new data become available.
>Data mining is the use of analytical techniques for better
understanding patterns and relationships that exist in large data sets.
For example, by analyzing text on social network platforms like Twitter,
data-mining techniques (including cluster analysis and sentiment
analysis) are used by companies to better understand their customers.
Predictive Analytics:
• Predictive analytics consists of techniques that use models
constructed from past data to predict the future or ascertain the
impact of one variable on another. For example, past data on product
sales may be used to construct a mathematical model to predict
future sales.
>Linear regression, time series analysis, some data-mining techniques,
and simulation, often referred to as risk analysis, all fall under the
banner of predictive analytics.
>Simulation involves the use of probability and statistics to construct a
computer model to study the impact of uncertainty on a decision.
Prescriptive Analytics:
• Prescriptive analytics differs from descriptive and predictive analytics
in that prescriptive analytics indicates a course of action to take; that
is, the output of a prescriptive model is a decision.
>Predictive models provide a forecast or prediction, but do not provide
a decision. However, a forecast or prediction, when combined with a
rule, becomes a prescriptive model.
What is Statistics?
• Meaning of the word Statistics:
There are three meaning of the word statistics:
1. Statistics( plural): Facts and figures itself is called statistics
e.g. Import statistics, production statistics, results statistics
2. Statistics( plural): It is a subject of social science which deals with the collection,
organization or presentation, analysis of data and interpretation the results about
the population from the data based on samples for decision making.
3. Statistic(Singular): It is characteristic of sample in numerical values
Statistics and Business Analytics:
• Statistics is an important tool of Business Analytics.
• The field of statistics is concerned with collecting, analyzing, interpreting,
and presenting data. The field of analytics is concerned with applying
statistical methods to practical business problems.
• Business analysts often use descriptive statistics to summarize data related
to the finances of companies.
• Statistics is the foundation of business analytics. Since we know that
business analytics is a combination of computer science and statistics.
There are various statistical methods which are the foundation of business
analytics methods like sampling, hypothesis testing, correlation, regression
and so forth.
Big Data:
• Big data is any set of data that is too large or too complex to be
handled by standard data-processing techniques and typical desktop
software.
Big data refers to data that is so large, fast or complex that it's difficult or
impossible to process using traditional methods. The act of accessing and
storing large amounts of information for analytics has been around for a long
time.
Big data analytics describes the process of uncovering trends, patterns, and
correlations in large amounts of raw data to help make data-informed
decisions. These processes use familiar statistical analysis techniques—like
clustering and regression—and apply them to more extensive datasets with
the help of newer tools.
The 4 V’s of Big Data:
• Volume
>Because data are collected electronically, we are able to collect more of it. To be
useful, these data must be stored, and this storage has led to vast quantities of
data. Many companies now store in excess of 100 terabytes of data (a terabyte is
1,024 gigabytes).
• Velocity
>Real-time capture and analysis of data present unique challenges both in how
data are stored, and the speed with which those data can be analyzed for decision
making. For example, the New York Stock Exchange collects 1 terabyte of data in a
single trading session, and having current data and real-time rules for trades and
predictive modeling are important for managing stock portfolios
• Variety
>In addition to the sheer volume and speed with which companies now
collect data, more complicated types of data are now available and are
proving to be of great value to businesses. Text data are collected by
monitoring what is being said about a company’s products or services
on social media platforms such as Twitter. Audio data are collected
from service calls (on a service call, you will often hear “this call may be
monitored for quality control”). Video data collected by in-store video
cameras are used to analyze shopping behavior. Analyzing information
generated by these nontraditional sources is more complicated in part
because of the processing required to transform the data into a
numerical form that can be analyzed
• Veracity
>Veracity has to do with how much uncertainty is in the data. For
example, the data could have many missing values, which makes
reliable analysis a challenge. Inconsistencies in units of measure and
the lack of reliability of responses in terms of bias also increase the
complexity of the data.
Key Terms In Statistics
• Population: The totality of the data with which we are concerned is called
population.
• Sample: It is the subset or portion of the population.
• Parameter: The characteristics of population in numerical terms is called
parameter.
e.g. Population A.M, the notation of which is µ
• Statistic: The characteristics of sample in numerical terms is called statistic
e.g. A.M of sample, the notation of which is 𝑥
Variable: It is a characteristic or measurement that can be determined for each member of a
population . Variable is denoted by X or Y
Variable may be  Numerical and Categorical
1.
Numerical: Numerical variable take one value with equal unit of measurement such a weight
in pounds, time in hours etc.
Numerical Variable are of two types:
Discrete: A variable whose value is obtained by counting e.g. no of calls in a day
Continuous: A continuous variable take all values in a given interval. The value of continuous
variable is obtained by measuring e.g. temperature 20o to 30o .
2. Categorical: It is a variable that can take values on the basis of some qualitative property e.g.
person affiliation with a political party.
X  PTI, PPP, NML.
Type of Data
Data: are the actual values of the variable. They may be numbers or
words.
So the actual data can be divided into:
• Quantitative Data  discrete and continuous variable
• Qualitative Data  Categorical or subjective variable
Organizing Qualitative Data
• Tabular Presentation
• Frequency Distributions: A frequency distribution of qualitative data is a listing of the
distinct values and their frequencies.
• To Construct a Frequency Distribution of Qualitative Data
Step 1 List the distinct values of the observations in the data set in the first column of a
table.
Step 2 For each observation, place a tally mark in the second column of the table in the
row of the appropriate distinct value.
Step 3 Count the tallies for each distinct value and record the totals in the third column of
the table.
Example
Political party affiliations of the students in introductory statistics
Table for constructing a frequency distribution for the political party affiliation
Relative-frequency distribution
• A relative-frequency distribution of qualitative data is a listing of the
distinct values and their relative frequencies.
• To Construct a Relative-Frequency Distribution of Qualitative Data
Step 1 Obtain a frequency distribution of the data.
Step 2 Divide each frequency by the total number of observations.
Example
• Figure out your first relative frequency by dividing the count by the
total.
For the category of dogs we have 16 out of 56, so 16/56=0.29
Graphical Presentation of Qualitative Data
Pie Charts
Pie Charts Another method for organizing and summarizing data is to draw a
picture of some kind. The old saying “a picture is worth a thousand words” has
particular relevance in statistics—a graph or chart of a data set often provides the
simplest and most efficient display. Two common methods for graphically
displaying qualitative data are pie charts and bar charts
-A pie chart is a disk divided into wedge-shaped pieces proportional to the relative
frequencies of the qualitative data.
• To Construct a Pie Chart
Step 1 Obtain a relative-frequency distribution of the data.
Step 2 Divide a disk into wedge-shaped pieces proportional to the relative
frequencies.
Step 3 Label the slices with the distinct values and their relative frequencies.
Example of Pie Chart:
Political Party Affiliations of the students
Bar Chart
• Bar Charts : Another graphical display for qualitative data is the bar
chart.
- A bar chart displays the distinct values of the qualitative data on a
horizontal axis and the relative frequencies (or frequencies or percent)
of those values on a vertical axis. The relative frequency of each distinct
value is represented by a vertical bar whose height is equal to the
relative frequency of that value. The bars should be positioned so that
they do not touch each other.
Example of Bar Chart:
Organizing Quantitative Data
Quantitative Data
Tabular Presentation
• Frequency Distribution Table
-To organize quantitative data, we first group the observations into classes (also
known as categories or bins) and then treat the classes as the distinct values of
qualitative data. Consequently, once we group the quantitative data into classes, we
can construct frequency and relative-frequency distributions of the data in exactly
the same way as we did for qualitative data.
Three important guidelines for grouping quantitative data into classes are:
1. The number of classes should be small enough to provide an effective summa
large enough to display the relevant characteristics of the data. A rule of thum
the number of classes should be between 5 and 20.
Guide Line for number of classes:
Number of Classes = 1 + 3.3 log N
If N=100
No. of classes = 1+ 3.3 log 100 = 1+3.3x2.0 = 8
2. Each observation must belong to only one class.
3. Whenever feasible, all classes should have the same width. Roughly speaking
possible, all classes should cover the same number of possible values.
• Class Width = Range/No. of Classes
Scales of Measurement
Data collection requires one of the following scales of measurement: nominal, ordinal,
interval, or ratio.
The scale of measurement determines the amount of information contained in the data
and indicates the most appropriate data summarization and statistical analyses.
When the data for a variable consist of labels or names used to identify an attribute
of the element, the scale of measurement is considered a nominal scale. For example,
referring to the data in Table 1.1, the scale of measurement for the WTO Status variable is
nominal because the data “member” and “observer” are labels used to identify the status
category for the nation. In cases where the scale of measurement is nominal, a numerical
code as well as a non numerical label may be used.
For example, to facilitate data collection and to prepare the data for
entry into a computer database, we might use a numerical code for the
WTO Status variable by letting 1 denote a member nation in the World
Trade Organization and 2 denote an observer nation. The scale of
measurement is nominal even though the data appear as numerical
values.
• The scale of measurement for a variable is considered an ordinal
scale if the data exhibit the properties of nominal data and in
addition, the order or rank of the data is meaningful.
• For example, referring to the data in Table 1.1, the scale of
measurement for the Fitch Rating is ordinal because the rating labels,
which range from AAA to F, can be rank ordered from best credit
rating (AAA) to poorest credit rating (F). The rating letters provide the
labels similar to nominal data, but in addition, the data can also be
ranked or ordered based on the credit rating, which makes the
measurement scale ordinal.
• Ordinal data can also be recorded by a numerical code, for example,
your class rank in school.
• The scale of measurement for a variable is an interval scale if the data
have all the properties of ordinal data and the interval between
values is expressed in terms of a fixed unit of measure. Interval data
are always numerical. College admission SAT scores are an example of
interval-scaled data. For example, three students with SAT math
scores of 620, 550, and 470 can be ranked or ordered in terms of best
performance to poorest performance in math. In addition, the
differences between the scores are meaningful. For instance, student
1 scored 620 − 550 = 70 points more than student 2, while student 2
scored 550 − 470 = 80 points more than student 3.
• The scale of measurement for a variable is a ratio scale if the data
have all the properties of interval data and the ratio of two values is
meaningful. Variables such as distance, height, weight, and time use
the ratio scale of measurement. This scale requires that a zero value
be included to indicate that nothing exists for the variable at the zero
point. For example, consider the cost of an automobile. A zero value
for the cost would indicate that the automobile has no cost and is
free. In addition, if we compare the cost of $30,000 for one
automobile to the cost of $15,000 for a second automobile, the ratio
property shows that the first automobile is $30,000/$15,000 = 2
times, or twice, the cost of the second automobile.
Quantitative & Qualitative Variables
• 2. Comparing Tablet Computers. Tablet PC Comparison provides a wide
variety of
• information about tablet computers. The company’s website enables
consumers to easily compare different tablets using factors such as cost,
type of operating system, display size, battery life, and CPU manufacturer. A
sample of 10 tablet computers is shown in Table 1.6 (Tablet PC Comparison
website).
• a. How many elements are in this data set?
• b. How many variables are in this data set?
• c. Which variables are categorical and which variables are quantitative?
• d. What type of measurement scale is used for each of the variables?
• Data The facts and figures collected, analyzed, and summarized for
presentation and interpretation.
• Data mining The process of using procedures from statistics and computer
science to extract useful information from extremely large databases.
• Elements The entities on which data are collected.
• Observation The set of measurements obtained for a particular element.
• Nominal scale The scale of measurement for a variable when the data are
labels or names used to identify an attribute of an element. Nominal data
may be nonnumeric or numeric.
Scale of Measurement:
• Nominal scale The scale of measurement for a variable when the data are
labels or names used to identify an attribute of an element. Nominal data
may be nonnumeric or numeric.
• Ordinal scale The scale of measurement for a variable if the data exhibit
the properties of nominal data and the order or rank of the data is
meaningful. Ordinal data may be nonnumeric or numeric.
• Interval scale The scale of measurement for a variable if the data
demonstrate the properties of ordinal data and the interval between
values is expressed in terms of a fixed unit of measure. Interval data are
always numeric.
• Ratio scale The scale of measurement for a variable if the data
demonstrate all the properties of interval data and the ratio of two values
is meaningful. Ratio data are always numeric.
The following table provides a summary of the variables in each measurement scale:
Property
Nominal
Ordinal
Interval
Ratio
Has a natural
“order”
NO
YES
YES
YES
Mode can be
calculated
YES
YES
YES
YES
YES
YES
YES
Mean can be
calculated
YES
YES
Exact difference
between values
YES
YES
Median can be
calculated
Has a “true zero”
value
YES
Download