Uploaded by Marwa Mostafa Sabry

Lect 1 - Introduction

advertisement

Course Instructor: Dr.Marwa Sabry

Course TA.: Eng.Yousra Ayman

Course ID on Bb: 212201.FCI.DS342

Course Code on Bb: 135964
3
1
2
I
II
(Business) Analytics is the use of:
 data,
 information technology,
 statistical analysis,
 quantitative methods, and
 mathematical or computer-based models
to help managers gain improved insight about their business
operations and make better, fact-based decisions.

Pricing
◦ setting prices for consumer and industrial goods, government
contracts, and maintenance contracts

Customer segmentation
◦ identifying and targeting key customer groups in retail, insurance,
and credit card industries

Merchandising
◦ determining brands to buy, quantities, and allocations

Location
◦ finding the best location for bank branches and ATMs, or where to
service industrial equipment

Social Media
◦ understand trends and customer perceptions; assist marketing
managers and product designers





Business intelligence
Information Systems
Statistics
Operations research/Management science
Decision support systems

Benefits
◦ …reduced costs, better risk management, faster decisions, better productivity
and enhanced bottom-line performance such as profitability and customer
satisfaction.

Challenges
◦ …lack of understanding of how to use analytics, competing business priorities,
insufficient analytical skills, difficulty in getting good data and sharing
information, and not understanding the benefits versus perceived costs of
analytics studies.



Descriptive analytics: the use of data to understand past and
current business performance and make informed decisions
Predictive analytics: predict the future by examining historical data,
detecting patterns or relationships in these data, and then
extrapolating these relationships forward in time.
Prescriptive analytics: identify the best alternatives to minimize or
maximize some objective











Database queries and analysis
Dashboards to report key performance measures
Data visualization
Statistical methods
Spreadsheets and predictive models
Scenario and “what-if” analyses
Simulation
Forecasting
Data and text mining
Optimization
Social media, web, and text analytics



Most department stores clear seasonal inventory by reducing prices.
Key question: When to reduce the price and by how much to
maximize revenue?
Potential applications of analytics:
 Descriptive analytics: examine historical data for similar products (prices, units
sold, advertising, …)
 Predictive analytics: predict sales based on price
 Prescriptive analytics: find the best sets of pricing and advertising to maximize
sales revenue

SAS Analytics
◦ Predictive modeling and data mining, visualization, forecasting,
optimization and model management, statistical analysis, text
analytics, and more.

Tableau Software
◦ Simple drag and drop tools for visualizing data from spreadsheets
and other databases.

RapidMiner
◦ RapidMiner is a data science software platform developed by the
company of the same name that provides an integrated
environment for data preparation, machine learning, deep
learning, text mining, and predictive analytics.


Data: numerical or textual facts and figures that are collected
through some type of measurement process.
Information: result of analyzing data; that is, extracting meaning
from data to support evaluation and decision making.








Annual reports
Accounting audits
Financial profitability analysis
Economic trends
Marketing research
Operations management performance
Human resource measurements
Web behavior
 page views, visitor’s country, time of view, length of time, origin and destination
paths, products they searched for and viewed, products purchased, what
reviews they read, and many others.

Data set - a collection of data.
◦ Examples: Marketing survey responses, a table of historical stock prices, and
a collection of measurements of dimensions of a manufactured item.

Database - a collection of related files containing records on people,
places, or things.
◦ A database file is usually organized in a two-dimensional table, where the
columns correspond to each individual element of data (called fields, or
attributes), and the rows represent records of related data elements.
Records
(Observations)
Entities
(Elements)
Fields or Attributes
(Variables)



Reliability - data are accurate and consistent.
Validity - data correctly measures what it is supposed to measure.
Examples:
◦ A tire pressure gage that consistently reads several pounds of pressure
below the true value is not reliable, although it is valid because it does
measure tire pressure.
◦ The number of calls to a customer service desk might be counted
correctly each day (and thus is a reliable measure) but not valid if it is
used to assess customer dissatisfaction, as many calls may be simple
queries.
◦ A survey question that asks a customer to rate the quality of the food in a
restaurant may be neither reliable (because different customers may
have conflicting perceptions) nor valid (if the intent is to measure
customer satisfaction, as satisfaction generally includes other elements
of service besides food).

Model - an abstraction or representation of a real system, idea, or
object.
 Captures the most important features
 Can be a written or verbal description, a visual representation, a mathematical
formula, or a spreadsheet.
The sales of a new product, such as a first-generation iPad or
3D television, often follow a common pattern.
1. Verbal description: The rate of sales starts small as early
adopters begin to evaluate a new product and then begins
to grow at an increasing rate over time as positive
customer feedback spreads. Eventually, the market
begins to become saturated and the rate of sales begins
to decrease.
2. Visual model: A sketch of sales as an S-shaped curve
over time
3. Mathematical model: S = aebect
where S is sales, t is time, e is the base of natural
logarithms, and a, b and c are constants.



total cost = fixed cost + variable cost
variable cost = unit variable cost × quantity produced
(1.2)
total cost = fixed cost + variable cost
= fixed cost + unit variable cost × quantity produced
(1.3)
(1.1)
Mathematical model:





TC = Total Cost
F = Fixed cost
V = Variable unit cost
Q = Quantity produced
TC = F +VQ
(1.4)


Decision model - a logical or mathematical representation of a problem or
business situation that can be used to understand, analyze, or facilitate
making a decision.
Inputs:
◦ Data, which are assumed to be constant for purposes of the model.
◦ Uncontrollable variables, which are quantities that can change but cannot be directly
controlled by the decision maker.
◦ Decision variables, which are controllable and can be selected at the discretion of the
decision maker.



Uncertainty is imperfect knowledge of what will happen
in the future.
Risk is associated with the consequences of what
actually happens.
“To try to eliminate risk in business enterprise is futile.
Risk is inherent in the commitment of present resources
to future expectations. Indeed, economic progress can
be defined as the ability to take greater risks. The
attempt to eliminate risks, even the attempt to minimize
them, can only make them irrational and unbearable. It
can only result in the greatest risk of all: rigidity.”
– Peter Drucker


Prescriptive decision models help decision makers identify the
best solution.
Optimization - finding values of decision variables that minimize (or
maximize) something such as cost (or profit).
 Objective function - the equation that minimizes (or maximizes) the quantity
of interest.
 Constraints - limitations or restrictions.
 Optimal solution - values of the decision variables at the minimum (or
maximum) point.


Deterministic model – all model input information
is known with certainty.
Stochastic model – some model input
information is uncertain.
◦ For instance, suppose that customer demand is an
important element of some model. We can make the
assumption that the demand is known with certainty; say,
5,000 units per month (deterministic). On the other hand,
suppose we have evidence to indicate that demand is
uncertain, with an average value of 5,000 units per
month, but which typically varies between 3,200 and
6,800 units (stochastic).

Spreadsheet modeling is an alternative to algebraic modeling that
relates various quantities in a spreadsheet with cell formulas.
◦ Instant feedback is available from spreadsheets, so if a formula is entered
incorrectly, it is often immediately obvious.
◦ Developing good spreadsheet models is not easy.
 They must be correct, well designed and well documented.
◦ A spreadsheet model for a specific example of the
product mix problem is shown below.

Book 2 portrays modeling as a seven-step process, but not all
problems require all seven steps.
1.
2.
3.
4.
5.
6.
7.
Define the problem.
Collect and summarize data.
Develop a model.
Verify the model.
Select one or more suitable decisions.
Present the results to the organization.
Implement the model and update it over time



Many commercial software packages can be used for Business
Analytics.
Spreadsheet software, such as Microsoft Excel, is widely available
and used across all areas of business.
Spreadsheets provide a flexible modeling environment for
manipulating data and developing and solving models.










Opening, saving, and printing files
Using workbooks and worksheets
Moving around a spreadsheet
Selecting cells and ranges
Inserting/deleting rows and columns
Entering and editing text, data, and formulas
Formatting data (number, currency, decimal)
Working with text strings
Formatting data and text
Modifying the appearance of a spreadsheet

Cell references can be relative or absolute. Using a dollar sign
before a row and/or column label creates an absolute reference.
◦ Relative references: A2, C5, D10
◦ Absolute references: $A$2, $C5, D$10



Using a $ sign before a row label (for example, B$4) keeps the
reference fixed to row 4 but allows the column reference to
change if the formula is copied to another cell.
Using a $ sign before a column label (for example, $B4) keeps
the reference to column B fixed but allows the row reference to
change.
Using a $ sign before both the row and column labels (for
example, $B$4) keeps the reference to cell B4 fixed no matter
where the formula is copied.
Two models for predicting demand as a function of price
Linear
D = a – bP
Formula in cell B8:
=$B$4-$B$5*$A8
Nonlinear
D = cP-d
Formula in cell E8:
=$E$4*D8^-$E$5
Note how the absolute addresses are used so that as these formulas
are copied down, the demand is computed correctly.
Formulas in cells can be copied in many ways.
 Use the Copy button in the Home tab, then use the Paste button
 Use Ctrl-C, then Ctrl-V
 Drag the bottom right corner of a cell (the fill handle) across a row or
column






Split Screen
Paste Special
Column and Row Widths
Displaying Formulas in Worksheets
Displaying Grid Lines and Column Headers for Printing
Filling a Range with a Series of Numbers






=MIN(range)
=MAX(range)
=SUM(range)
=AVERAGE(range)
=COUNT(range)
=COUNTIF(range,criteria)
◦ Excel has other useful COUNT-type functions: COUNTA counts
the number of nonblank cells in a range, and COUNTBLANK
counts the number of blank cells in a range. In addition,
COUNTIFS(range1, criterion1, range2, criterion2,… range_n,
criterion_n) finds the number of cells within multiple ranges that
meet specific criteria for each range.
=MIN(F4:F97)
=MAX(F4:F97)
=SUM(G4:G97)
=AVERAGE(H4:H97)
=COUNT(B4:B97)
=COUNTIF(D4:D97,”=O-Ring”)
=COUNTIF(H4:H97,”<30”)
=COUNTIFS(D4:D97,"O-Ring",A4:A97,"Spacetime Technologies")


SUMIF, AVERAGEIF, SUMIFS, and
AVERAGEIFS can be used to embed IF logic
within mathematical functions.
For instance, the syntax of SUMIF is
◦ SUMIF(range, criterion, [sum range]). "Sum range" is
an optional argument that allows you to add cells in a
different range.

Example: In the Purchase Orders database, to
find the total cost of all airframe fasteners, use
=SUMIF(D4:D97,"Airframe fasteners", G4:G97)



=IF(condition, value if true, value if false) – a
returns one value if the condition is true and
another if the condition is false,
=AND(condition1, condition2, …) – returns TRUE
if all conditions are true and FALSE if not,
=OR(condition1, condition2, …) – returns TRUE if
any condition is true and FALSE if not.




=IF(condition, value if true, value if false)
Conditions may include the following:
= equal
<> not equal to
> greater than
>= greater than or equal to
< less than
<= less than or equal to
You may nest up to 7 IF functions, replacing the
value if false with another IF function
Example:
=IF(A8 =2,(IF(B3 =5,”YES”,“ ”)),15)

Suppose that orders with quantities of at least
10,000 units are classified as Large.
◦ Cell K4: =IF(F4>=10000, “Large”, “Small”)

Suppose that large orders with a total cost of at
least $25,000 are considered critical.
◦ Cell L4: =IF(AND(K4=“Large”, G4>=25000),“Critical”,“”)

These functions are useful for finding specific data
in a spreadsheet.
 =VLOOKUP(lookup_value, table_array, col_index_num, [range lookup]) -



looks up a value in the leftmost column of a table and returns a value
in the same row from a column you specify
=HLOOKUP(lookup_value, table_array, row_index_num, [range lookup]) looks up a value in the top row of a table and returns a value in the
same column from a row you specify.
=INDEX(array, row_num, col_num) - returns a value or reference of the
cell at the intersection of a particular row and column in a given
range.
=MATCH(lookup_value, lookup_array, match_type) - returns the relative
position of an item in an array that matches a specified value in a
specified order




In the VLOOKUP and HLOOKUP functions, range lookup is optional. If this is
omitted or set as True, then the first column of the table must be sorted in ascending
numerical order.
If an exact match for the lookup_value is found in the first column, then Excel will
return the value the col_index_num of that row. If an exact match is not found, Excel
will choose the row with the largest value in the first column that is less than the
lookup_value.
If range lookup is False, then Excel seeks an exact match in the first column of the
table range. If no exact match is found, Excel will return #N/A (not available).
We recommend that you specify the range lookup to avoid errors.
=VLOOKUP(10007, $A$4:$H$475,3) returns the payment type
Credit.
=VLOOKUP(10007, $A$4:$H$475,4) returns the transaction code
80103311
Counts of categories
Categorical variables
(Section 2.3)
Charts of counts
Describe individual
variables
(Chapter 2)
Summary measures
(mean, median,
standard deviation,
quartiles, etc.)
Cross-sectional data
Histograms, box
plots
Numeric variables
(Section 2.4)
Time series
Purpose of analysis
Time series charts
for patterns
Trend lines
Purpose of analysis
Categorical vs
categorical
(Sections 3.2, 3.5)
Tables of joint counts
(cross-tabs or pivot
tables)
Charts of joint counts
Summary measures
by category
Find relationships
between variables
(Chapter 3)
Categorical vs
numeric
(Sections 3.3, 3.5)
Side-by-side
boxplots
Pivot tables
Scatterplots
Numeric vs numeric
(Sections 3.4, 3.5)
Correlations (and
covariances)
Pivot tables
Trend lines
(regression)
Download