Course Instructor: Dr.Marwa Sabry Course TA.: Eng.Yousra Ayman Course ID on Bb: 212201.FCI.DS342 Course Code on Bb: 135964 3 1 2 I II (Business) Analytics is the use of: data, information technology, statistical analysis, quantitative methods, and mathematical or computer-based models to help managers gain improved insight about their business operations and make better, fact-based decisions. Pricing ◦ setting prices for consumer and industrial goods, government contracts, and maintenance contracts Customer segmentation ◦ identifying and targeting key customer groups in retail, insurance, and credit card industries Merchandising ◦ determining brands to buy, quantities, and allocations Location ◦ finding the best location for bank branches and ATMs, or where to service industrial equipment Social Media ◦ understand trends and customer perceptions; assist marketing managers and product designers Business intelligence Information Systems Statistics Operations research/Management science Decision support systems Benefits ◦ …reduced costs, better risk management, faster decisions, better productivity and enhanced bottom-line performance such as profitability and customer satisfaction. Challenges ◦ …lack of understanding of how to use analytics, competing business priorities, insufficient analytical skills, difficulty in getting good data and sharing information, and not understanding the benefits versus perceived costs of analytics studies. Descriptive analytics: the use of data to understand past and current business performance and make informed decisions Predictive analytics: predict the future by examining historical data, detecting patterns or relationships in these data, and then extrapolating these relationships forward in time. Prescriptive analytics: identify the best alternatives to minimize or maximize some objective Database queries and analysis Dashboards to report key performance measures Data visualization Statistical methods Spreadsheets and predictive models Scenario and “what-if” analyses Simulation Forecasting Data and text mining Optimization Social media, web, and text analytics Most department stores clear seasonal inventory by reducing prices. Key question: When to reduce the price and by how much to maximize revenue? Potential applications of analytics: Descriptive analytics: examine historical data for similar products (prices, units sold, advertising, …) Predictive analytics: predict sales based on price Prescriptive analytics: find the best sets of pricing and advertising to maximize sales revenue SAS Analytics ◦ Predictive modeling and data mining, visualization, forecasting, optimization and model management, statistical analysis, text analytics, and more. Tableau Software ◦ Simple drag and drop tools for visualizing data from spreadsheets and other databases. RapidMiner ◦ RapidMiner is a data science software platform developed by the company of the same name that provides an integrated environment for data preparation, machine learning, deep learning, text mining, and predictive analytics. Data: numerical or textual facts and figures that are collected through some type of measurement process. Information: result of analyzing data; that is, extracting meaning from data to support evaluation and decision making. Annual reports Accounting audits Financial profitability analysis Economic trends Marketing research Operations management performance Human resource measurements Web behavior page views, visitor’s country, time of view, length of time, origin and destination paths, products they searched for and viewed, products purchased, what reviews they read, and many others. Data set - a collection of data. ◦ Examples: Marketing survey responses, a table of historical stock prices, and a collection of measurements of dimensions of a manufactured item. Database - a collection of related files containing records on people, places, or things. ◦ A database file is usually organized in a two-dimensional table, where the columns correspond to each individual element of data (called fields, or attributes), and the rows represent records of related data elements. Records (Observations) Entities (Elements) Fields or Attributes (Variables) Reliability - data are accurate and consistent. Validity - data correctly measures what it is supposed to measure. Examples: ◦ A tire pressure gage that consistently reads several pounds of pressure below the true value is not reliable, although it is valid because it does measure tire pressure. ◦ The number of calls to a customer service desk might be counted correctly each day (and thus is a reliable measure) but not valid if it is used to assess customer dissatisfaction, as many calls may be simple queries. ◦ A survey question that asks a customer to rate the quality of the food in a restaurant may be neither reliable (because different customers may have conflicting perceptions) nor valid (if the intent is to measure customer satisfaction, as satisfaction generally includes other elements of service besides food). Model - an abstraction or representation of a real system, idea, or object. Captures the most important features Can be a written or verbal description, a visual representation, a mathematical formula, or a spreadsheet. The sales of a new product, such as a first-generation iPad or 3D television, often follow a common pattern. 1. Verbal description: The rate of sales starts small as early adopters begin to evaluate a new product and then begins to grow at an increasing rate over time as positive customer feedback spreads. Eventually, the market begins to become saturated and the rate of sales begins to decrease. 2. Visual model: A sketch of sales as an S-shaped curve over time 3. Mathematical model: S = aebect where S is sales, t is time, e is the base of natural logarithms, and a, b and c are constants. total cost = fixed cost + variable cost variable cost = unit variable cost × quantity produced (1.2) total cost = fixed cost + variable cost = fixed cost + unit variable cost × quantity produced (1.3) (1.1) Mathematical model: TC = Total Cost F = Fixed cost V = Variable unit cost Q = Quantity produced TC = F +VQ (1.4) Decision model - a logical or mathematical representation of a problem or business situation that can be used to understand, analyze, or facilitate making a decision. Inputs: ◦ Data, which are assumed to be constant for purposes of the model. ◦ Uncontrollable variables, which are quantities that can change but cannot be directly controlled by the decision maker. ◦ Decision variables, which are controllable and can be selected at the discretion of the decision maker. Uncertainty is imperfect knowledge of what will happen in the future. Risk is associated with the consequences of what actually happens. “To try to eliminate risk in business enterprise is futile. Risk is inherent in the commitment of present resources to future expectations. Indeed, economic progress can be defined as the ability to take greater risks. The attempt to eliminate risks, even the attempt to minimize them, can only make them irrational and unbearable. It can only result in the greatest risk of all: rigidity.” – Peter Drucker Prescriptive decision models help decision makers identify the best solution. Optimization - finding values of decision variables that minimize (or maximize) something such as cost (or profit). Objective function - the equation that minimizes (or maximizes) the quantity of interest. Constraints - limitations or restrictions. Optimal solution - values of the decision variables at the minimum (or maximum) point. Deterministic model – all model input information is known with certainty. Stochastic model – some model input information is uncertain. ◦ For instance, suppose that customer demand is an important element of some model. We can make the assumption that the demand is known with certainty; say, 5,000 units per month (deterministic). On the other hand, suppose we have evidence to indicate that demand is uncertain, with an average value of 5,000 units per month, but which typically varies between 3,200 and 6,800 units (stochastic). Spreadsheet modeling is an alternative to algebraic modeling that relates various quantities in a spreadsheet with cell formulas. ◦ Instant feedback is available from spreadsheets, so if a formula is entered incorrectly, it is often immediately obvious. ◦ Developing good spreadsheet models is not easy. They must be correct, well designed and well documented. ◦ A spreadsheet model for a specific example of the product mix problem is shown below. Book 2 portrays modeling as a seven-step process, but not all problems require all seven steps. 1. 2. 3. 4. 5. 6. 7. Define the problem. Collect and summarize data. Develop a model. Verify the model. Select one or more suitable decisions. Present the results to the organization. Implement the model and update it over time Many commercial software packages can be used for Business Analytics. Spreadsheet software, such as Microsoft Excel, is widely available and used across all areas of business. Spreadsheets provide a flexible modeling environment for manipulating data and developing and solving models. Opening, saving, and printing files Using workbooks and worksheets Moving around a spreadsheet Selecting cells and ranges Inserting/deleting rows and columns Entering and editing text, data, and formulas Formatting data (number, currency, decimal) Working with text strings Formatting data and text Modifying the appearance of a spreadsheet Cell references can be relative or absolute. Using a dollar sign before a row and/or column label creates an absolute reference. ◦ Relative references: A2, C5, D10 ◦ Absolute references: $A$2, $C5, D$10 Using a $ sign before a row label (for example, B$4) keeps the reference fixed to row 4 but allows the column reference to change if the formula is copied to another cell. Using a $ sign before a column label (for example, $B4) keeps the reference to column B fixed but allows the row reference to change. Using a $ sign before both the row and column labels (for example, $B$4) keeps the reference to cell B4 fixed no matter where the formula is copied. Two models for predicting demand as a function of price Linear D = a – bP Formula in cell B8: =$B$4-$B$5*$A8 Nonlinear D = cP-d Formula in cell E8: =$E$4*D8^-$E$5 Note how the absolute addresses are used so that as these formulas are copied down, the demand is computed correctly. Formulas in cells can be copied in many ways. Use the Copy button in the Home tab, then use the Paste button Use Ctrl-C, then Ctrl-V Drag the bottom right corner of a cell (the fill handle) across a row or column Split Screen Paste Special Column and Row Widths Displaying Formulas in Worksheets Displaying Grid Lines and Column Headers for Printing Filling a Range with a Series of Numbers =MIN(range) =MAX(range) =SUM(range) =AVERAGE(range) =COUNT(range) =COUNTIF(range,criteria) ◦ Excel has other useful COUNT-type functions: COUNTA counts the number of nonblank cells in a range, and COUNTBLANK counts the number of blank cells in a range. In addition, COUNTIFS(range1, criterion1, range2, criterion2,… range_n, criterion_n) finds the number of cells within multiple ranges that meet specific criteria for each range. =MIN(F4:F97) =MAX(F4:F97) =SUM(G4:G97) =AVERAGE(H4:H97) =COUNT(B4:B97) =COUNTIF(D4:D97,”=O-Ring”) =COUNTIF(H4:H97,”<30”) =COUNTIFS(D4:D97,"O-Ring",A4:A97,"Spacetime Technologies") SUMIF, AVERAGEIF, SUMIFS, and AVERAGEIFS can be used to embed IF logic within mathematical functions. For instance, the syntax of SUMIF is ◦ SUMIF(range, criterion, [sum range]). "Sum range" is an optional argument that allows you to add cells in a different range. Example: In the Purchase Orders database, to find the total cost of all airframe fasteners, use =SUMIF(D4:D97,"Airframe fasteners", G4:G97) =IF(condition, value if true, value if false) – a returns one value if the condition is true and another if the condition is false, =AND(condition1, condition2, …) – returns TRUE if all conditions are true and FALSE if not, =OR(condition1, condition2, …) – returns TRUE if any condition is true and FALSE if not. =IF(condition, value if true, value if false) Conditions may include the following: = equal <> not equal to > greater than >= greater than or equal to < less than <= less than or equal to You may nest up to 7 IF functions, replacing the value if false with another IF function Example: =IF(A8 =2,(IF(B3 =5,”YES”,“ ”)),15) Suppose that orders with quantities of at least 10,000 units are classified as Large. ◦ Cell K4: =IF(F4>=10000, “Large”, “Small”) Suppose that large orders with a total cost of at least $25,000 are considered critical. ◦ Cell L4: =IF(AND(K4=“Large”, G4>=25000),“Critical”,“”) These functions are useful for finding specific data in a spreadsheet. =VLOOKUP(lookup_value, table_array, col_index_num, [range lookup]) - looks up a value in the leftmost column of a table and returns a value in the same row from a column you specify =HLOOKUP(lookup_value, table_array, row_index_num, [range lookup]) looks up a value in the top row of a table and returns a value in the same column from a row you specify. =INDEX(array, row_num, col_num) - returns a value or reference of the cell at the intersection of a particular row and column in a given range. =MATCH(lookup_value, lookup_array, match_type) - returns the relative position of an item in an array that matches a specified value in a specified order In the VLOOKUP and HLOOKUP functions, range lookup is optional. If this is omitted or set as True, then the first column of the table must be sorted in ascending numerical order. If an exact match for the lookup_value is found in the first column, then Excel will return the value the col_index_num of that row. If an exact match is not found, Excel will choose the row with the largest value in the first column that is less than the lookup_value. If range lookup is False, then Excel seeks an exact match in the first column of the table range. If no exact match is found, Excel will return #N/A (not available). We recommend that you specify the range lookup to avoid errors. =VLOOKUP(10007, $A$4:$H$475,3) returns the payment type Credit. =VLOOKUP(10007, $A$4:$H$475,4) returns the transaction code 80103311 Counts of categories Categorical variables (Section 2.3) Charts of counts Describe individual variables (Chapter 2) Summary measures (mean, median, standard deviation, quartiles, etc.) Cross-sectional data Histograms, box plots Numeric variables (Section 2.4) Time series Purpose of analysis Time series charts for patterns Trend lines Purpose of analysis Categorical vs categorical (Sections 3.2, 3.5) Tables of joint counts (cross-tabs or pivot tables) Charts of joint counts Summary measures by category Find relationships between variables (Chapter 3) Categorical vs numeric (Sections 3.3, 3.5) Side-by-side boxplots Pivot tables Scatterplots Numeric vs numeric (Sections 3.4, 3.5) Correlations (and covariances) Pivot tables Trend lines (regression)