UNIT III BUSINESS FORECASTING 6 Introduction to Business Forecasting and Predictive analytics - Logic and Data Driven Models –Data Mining and Predictive Analysis Modelling –Machine Learning for Predictive analytics. Business forecasting Business forecasting refers to the process of predicting future market conditions by using business intelligence tools and forecasting methods to analyze historical data. Business forecasting can be either qualitative or quantitative. Quantitative business forecasting relies on subject matter experts and market research while quantitative business forecasting focuses only on data analysis. Quantitative Forecasting Quantitative forecasting is applicable when there is accurate past data available to predict the probability of future events. This method pulls patterns from the data that allow for more probable outcomes. The data used in quantitative forecasting can include in-house data such as sales numbers and professionally gathered data such as census statistics. Generally, quantitative forecasting seeks to connect different variables in order to establish cause and effect relationships that can be exploited to benefit the business. Qualitative Forecasting Qualitative forecasting is based on the opinion and judgment of consumers and experts. This business forecasting method is useful if you have insufficient historical data to make any statistically relevant conclusions. In such cases, an expert can help piece together the known bits of data you do have to try to make a qualitative prediction from that known information. Business Forecasting Process Here are the steps that a business forecaster should typically follow: 1. Define the question or problem you need to solve with your business forecasting efforts. For example, you might be interested in estimating whether your organization will be able to meet product demand for the next quarter. 2. Identify the datasets and variables that need to be taken into consideration. In this case, datasets such as the sales records from the previous year and variables related to capacity, production and demand planning. 3. Choose a business forecasting method that adjusts to your dataset and forecasting goals. That depends on whether your problem or question can be solved using a qualitative, quantitative or mixed approach 4. Based on the analysis of historical data, you can proceed to estimate future business performance. Keep in mind that the accuracy of your business forecasting depends on the quality of your data. 5. Determine the discrepancy between your business forecast and actual business performance. Document your findings and improve your business forecasting process. Business Forecasting Methods As stated above, there are two main types of business forecasting methods, qualitative and quantitative. some of the more common forecasting models from both sides below. Delphi Method This qualitative business forecasting method consists in gathering a panel of subject matter experts and getting their opinions on the same topic in a manner in which they can’t know each other’s thoughts. This is done to prevent bias, which makes it possible for a manager to objectively compare their opinions and see if there are patterns, consensus or division. Market Research There are many market research techniques that evaluate the behavior of customers and their response to a certain product or service. Some of those market research methods collect and analyze quantitative data, such as digital marketing metrics and others qualitative data, such as product testing, or customer interviews. Time Series Analysis Also referred to as “trend analysis method,” this business forecasting technique simply requires the forecaster to analyze historical data to identify trends. This data analysis process requires statistical analysis as outliers need to be removed. More recent data should be given more weight to better reflect the current state of the business. The Average Approach The average approach says that the predictions of all future values are equal to the mean of the past data. Past data is required to use this method, so it can be considered a type of quantitative forecasting. This approach is often used when you need to predict unknown values as it allows you to make calculations based on past averages, where one assumes that the future will closely resemble the past. The Naïve Approach The naïve approach is the most cost-effective and is often used as a benchmark to compare against more sophisticated methods. It’s only used for time series data where forecasts are made equal to the last observed value. This approach is useful in industries and sectors where past patterns are unlikely to be reproduced in the future. In such cases, the most recent observed value may prove to be the most informative. Elements of Business Forecasting 1. Develop the Basis: Before you can start forecasting, you must develop a system to investigate the current economic situation around you. That includes your industry and its present position as well as its popular products to better estimate sales and general business operations. 2. Estimating Future Business Operations: Now comes the estimation of future conditions, such as the course that future events are likely to take in your industry. Again, this is based on collected data to help with quantitative estimates for the scale of operations in the future. 3. Regulating Forecasts: Whatever your forecast is, it must be compared to actual results. This is the only way to find deviations from the norm. Then the reasons for those deviations must be figured out, so action can be taken to correct those deviations in the future. 4. Reviewing Forecasting Process: By reviewing the deviations between forecasts and actual performance data, improvements are made in the process, allowing you to refine and review the information for accuracy. Predictive analytics Predictive analytics encompasses a variety of statistical techniques from predictive modeling, machine learning, and data mining that analyze current and historical facts to make predictions about future or otherwise unknown events Predictive analytics, a branch in the domain of advanced analytics, is used in predicting the future events. It analyzes the current and historical data in order to make predictions about the future by employing the techniques from statistics, data mining, machine learning, and artificial intelligence In business, predictive models exploit patterns found in historical and transactional data to identify risks and opportunities. Models capture relationships among many factors to allow assessment of risk or potential associated with a particular set of conditions, guiding decision making for candidate transactions.[ Consider the power of predictive analytics: • A Canadian bank uses predictive analytics to increase campaign response rates by 600% , cut customer acquisition costs in half, and boost campaign ROI by 100%. • A large state university predicts whether a student will choose to enroll by applying predictive models to applicant data and admissions history. • A research group at a leading hospital combined predictive and text analytics to improve its ability to classify and treat pediatric brain tumors. How Predictive Analytics Works Predictive analytics is driven by machine-learning algorithms, principally decision trees, log linear regression, and neural networks. These algorithms perform pattern matching. They determine how closely new data matches a reference pattern. The algorithms are trained on real data and then compute a predictive score for each individual they analyze. Figure 1: Predictive Analytics Process Requirement Collection To develop a predictive model, it must be cleared that what is the aim of prediction. Through the prediction, the type of knowledge which will be gained should be defined. For example, a pharmaceutical company wants to know the forecast on the sale of a medicine in a particular area to avoid expiry of those medicines Data Collection After knowing the requirement of the client organization, the analyst will collect the datasets, may be from different sources, required in developing the predictive model. Data Analysis and Massaging Data analysts analyze the collected data and prepare it for analysis and to be used in the model. The unstructured data is converted into a structured form in this step. Once the complete data is available in the structured form, its quality is then tested. There are possibilities that erroneous data is present in the main dataset or there are many missing values against the attributes, these all must be addressed. The effectiveness of the predictive model totally depends on the quality of data. The analysis phase is sometimes referred to as data munging or massaging the data that means converting the raw data into a format that is used for analytics. Statistics, Machine Learning The predictive analytics process employs many statistical and machine learning technique. Probability theory and regression analysis are most important techniques which are popularly used in analytics. Similarly, artificial neural networks, decision tree, support vector machines are the tools of machine learning which are widely used in many predictive analytics tasks. Predictive Modeling In this phase, a model is developed based on statistical and machine learning techniques and the example dataset. After the development, it is tested on the test dataset which a part of the main collected dataset to check the validity of the model and if successful, the model is said to be fit. Once fitted, the model can make accurate predictions on the new data entered as input to the system. In many applications, the multi-model solution is opted for a problem. 2.5 Prediction and Monitoring After the successful tests in predictions, the model is deployed at the client’s site for everyday predictions and decision- making process. The results and reports are generated by the model nor managerial process. The model is consistently monitored to ensure whether it is giving the correct results and making the accurate predictions. 4. PREDICTIVE ANALYTICSTECHNIQUES All the predictive analytics models are grouped into classification models and regression models. Classification models predict the membership of values to certain class while the regression models predict a number. We will now list out the important techniques below which are used popularly in developing the predictive models. Decision Tree A decision tree is a classification model but it can be used in regression as well. It is a tree-like model which relates the decisions and their possible consequences [11]. The consequences may be the outcome of events, cost of resources or utility. In its tree-like structure, each branch represents a choice between a number of alternatives and its every leaf represents a decision Regression Model Regression is one of the most popular statistical technique which estimates the relationship between variables. It models the relationship between a dependent variable and one or more independent variables. It analyzes how the value of dependent variable changes on changing the values of independent variables in the modeled relation. Artificial Neural Network Artificial neural network, a network of artificial neurons based on biological neurons, simulates the human nervous system capabilities of processing the input signals and producing the outputs. This is a sophisticated model that is capable of modeling the extremely complex relations. The architecture of a general purpose artificial neural network is represented in figure 5. Bayesian Statistics This technique belongs to the statistics which takes parameters as random variables and use the term “degree of belief” to define the probability of occurrence of an event [14]. The Bayesian statistics is based on Bayes’ theorem which terms the events priori and posteriori. In conditional probability, the approach is to find out the probability of a posteriori event given that priori has occurred. On the other hand, the Bayes’ theorem finds the probability of priori event given that posteriori has already occurred. It is represented in figure 6. Ensemble Learning It belongs to the category of supervised learning algorithms in the branch of machine learning. These model are developed by training several similar type models and finally combining their results on prediction. In this way, the accuracy of the model is improved. Development in this way reduce the bias and reduce the variance of the model. It helps in identifying the best model to be used with new data Support Vector Machine It is supervised kind of machine learning technique popularly used in predictive analytics. With associative learning algorithms, it analyzes the data for classification and regression. However, it is mostly used in classification applications. It is a discriminative classifier which is defined by a hyperplane to classify examples into categories. It is the representation of examples in a plane such that the examples are separated into categories with a clear gap. The new examples are then predicted to belong to a class as which side of the gap they fall. Time Series Analysis Time series analysis is a statistical technique which uses time series data which is collected over a time period at a particular interval. It combines the traditional data mining techniques and the forecasting . The time series analysis is divided into two categories, namely the frequency domain and the time domain. It predicts the future of a variable at future time intervals based on the analysis of values at past time intervals. It is used in stock market prediction and weather forecasting very popularly. An example of variation in the price of some product over the period of time and its trends forecast in future years is represented in figure. 4. APPLICATION OF PREDICTIVEANALYTICS Banking and Financial Services In banking and financial industries, there is a large application of predictive analytics. In both the industries data and money is crucial part and finding insights from those data and the movement of money is a must. The predictive analytics helps in detecting the fraudulent customers and suspicious transactions. It minimizes the credit risk on which theses industries lend money to its customers. It helps in cross-sell and up-sell opportunities and in retaining and attracting the valuable customers Retail The predictive analytics helps the retail industry in identify the customers and understanding what they need and what they want. By applying this technique, they predict the behavior of customers towards a product. The companies may fix prices and set special offers on the products after identifying the buying behavior of customers. It also helps the retail industry in predicting that how a particular product will be successful in a particular season. They may campaign their products and approach to customers with offers and prices fixed for individual customers. The predictive analytics also helps the retail industries in improving their supply-chain. They identify and predict the demand for a product in the specific area may improve their supply of products. Health and Insurance The pharmaceutical sector uses predictive analytics in drug designing and improving their supply chain of drugs. By using this technique, these companies may predict the expiry of drugs in a specific area due to lack of sale. The insurance sector uses predictive analytics models in identifying and predicting the fraud claims filed by the customers. The health insurance sector using this technique to find out the customers who are most at risk of a serious disease and approach them in selling their insurance plans which be best for their investment . Oil Gas and Utilities The oil and gas industries are using the predictive analytics techniques in forecasting the failure of equipment in order to minimize the risk. They predict the requirement of resources in future using these models. The need for maintenance can be predicted by energy-based companies to avoid any fatal accident in future. Government and Public Sector The government agencies are using big data-based predictive analytics techniques to identify the possible criminal activities in a particular area. They analyze the social media data to identify the background of suspicious persons and forecast their future behavior. The governments are using the predictive analytics to forecast the future trend of the population at country level and state level. In enhancing the cybersecurity, the predictive analytics techniques are being used in full swing. Data-Driven Model Data-driven Models refers to the models in which data is collected from many sources to qualitatively establish model relationships. The main aim of data-driven model concept is to find links between the state system variables (input and output) without clear knowledge of the physical attributes and behaviour of the system. The data driven predictive modelling derives the modelling method based on the set of existing data and entails a predictive methodology to forecast the future outcomes. It is data-driven only when there is no clear knowledge of the relationships among variables/system, though there is lot of data. Here, you are simply predicting the outcomes based on the data. The model is not based on hand-picked variables, but may contain unobserved, hidden combination of variables. Artificial intelligence (AI), which is the overarching study of how human intelligence can be incorporated into computers. • computational intelligence (CI), which includes neural networks, fuzzy systems and evolutionary computing as well as other areas within AI and machine learning. • soft computing (SC), which is close to CI, but with special emphasis on fuzzy rule-based systems induced from data. • machine learning (ML), which was once a sub-area of AI that concentrates on the theoretical foundations used by CI and SC. • data mining (DM) and knowledge discovery in databases (KDD) are focused large databases and are associated with applications in banking, often at very financial services and customer resources management. DM is seen as a part of a wider KDD. Methods used are mainly from statistics and ML. • intelligent data analysis (IDA), which tends to focus on data analysis in medicine and research and incorporates methods from statistics and ML Logic driven models Logic driven models remain based on experience, knowledge and logical relationships of variables and constants connected to the desired business performance outcome situation. It leverages statistics to predict outcomes. Most often the event one wants to predict is in the future, but predictive modeling can be applied to any type of unknown event, regardless of when it occurred. For example, predictive models are often used to detect crimes and identify suspects, after the crime has taken place. In many cases the model is chosen on the basis of detection theory to try to guess the probability of an outcome given a set amount of input data, for example given an email determining how likely that it is spam. Models can use one or more classifiers in trying to determine the probability of a set of data belonging to another set, say spam or ‘ham’. Predictive models can either be used directly to estimate a response (output) given a defined set of characteristics (input), or indirectly to drive the choice of decision rules. Depending on the methodology employed for the prediction, it is often possible to derive a formula that may be used in a spreadsheet software. Data mining and predictive analysis modelling Data mining is a process based on algorithms to analyze and extract useful information and automatically discover hidden patterns and relationships from data. Instead, predictive analytics is closely tied to machine learning, as it uses data patterns to make predictions, where machines take historical and current information and apply them to a model to predict future trends. In essence, the difference between predictive analytics and data mining is that the former explores the data and the latter answers “What is the next step?” Predictive data mining models A predictive data mining model predicts the values of data using known results gathered from the different data sets. Predictive modeling can not be classified as a separate discipline; it occurs in all organizations or industries across all disciplines. The main objective of predictive data mining models is to predict the future based on the past data, generally but not always on the statistical modeling. Predictive modeling is used in healthcare industries to identify high-risk patients with congestive heart failures, high blood pressure, diabetes, infection, cancer, etc. It is also used in the vehicle insurance company to assign the risk of accidents to the policyholder. A predictive model of a data mining task comprises classification, regression, prediction, and time series analysis. The predictive model of data mining is also called statistical regression. It refers to a monitoring learning technique that includes an explication of the dependency of a few attribute's values upon the other attribute's value in the same product and the growth of a model that can predict these attribute's values in previous cases. Classification: In data mining, classification refers to a form of data analysis where a machine learning model assigns a specific category to a new observation. It is based on what the model has learned from the data sets. In other words, classification is the act of assigning objects to many predefined categories. One example of classification in the banking and financial services industry is identifying whether transactions are fraudulent or not. In the same way, machine learning can also be used to predict whether a loan application would be approved or not. Regression: Regression refers to a method that verifies the value of data for a function. Generally, it is used for appropriate data. A linear regression model in the context of machine learning or statistics is basically a linear approach for modeling the relationships between the dependent variable known as the result and your independent variable is known as features. If your model has only one independent variable, it is called simple linear regression, and else it is called multiple linear regression. Types of regression 1. Linear Regression: Linear regression is related to the search for the optimal line which fits the two attributes so that with the help of one attribute, we can predict the other. 2. Multi-linear regression Multi-linear regression includes two or more than two attributes, and the data are fit to multidimensional space. Prediction: In data mining, prediction is used to identify data value based on the description of another corresponding data value. The prediction in data mining is known as Numeric Prediction. Generally, regression analysis is used for prediction. For example, in credit card fraud detection, data history for a particular person's credit card usage has to be analyzed. If any abnormal pattern was detected, it should be reported as 'fraudulent action'. Time series analysis: Time series analysis refers to the data sets based on time. It serves as an independent variable to predict the dependent variable in time. Descriptive model A descriptive model differentiates the patterns and relationships in data. A descriptive model does not attempt to generalize to a statistical population or random process. A predictive model attempts to generalize to a population or random process. Predictive models should give prediction intervals and must be cross-validated; that is, they must prove that they can be used to make predictions with data that was not used in constructing the model. Descriptive analytics focuses on the summarization and conversion of the data into useful information for reporting and monitoring. Clustering: Clustering is grouping a set of objects so that objects in the same group called a cluster are more similar than those in other groups clusters. Association rules: Association rules determine a causal relationship between huge sets of data objects. The way the algorithm works is that you have. For example, a list of items you purchase at the grocery store for the past six months data, and it calculates a percentage at which items are purchased together. For example, what are the chances of you buying milk with cereal? Sequence: Sequence refers to the discovery of useful patterns in the data is in relation to some objective of how it is interesting. Summarization: Summarization holds a data set in more depth which is easy to understand form. steps for predictive analytics using machine learning Applications of predictive analytics and machine learning For organisations overflowing with data but struggling to turn it into useful insights, predictive analytics and machine learning can provide the solution. No matter how much data an organisation has, if it can’t use that data to enhance internal and external processes and meet objectives, the data becomes a useless resource. Predictive analytics is most commonly used for security, marketing, operations, risk and fraud detection. Here are just a few examples of how predictive analytics and machine learning are utilised in different industries: 1. Banking and Financial Services In the banking and financial services industry, predictive analytics and machine learning are used in conjunction to detect and reduce fraud, measure market risk, identify opportunities and much, much more. 2. Security With cybersecurity at the top of every business’ agenda in 2017, it should come as no surprise that predictive analytics and machine learning play a key part in security. Security institutions typically use predictive analytics to improve services and performance, but also to detect anomalies, fraud, understand consumer behaviour and enhance data security. 3. Retail Retailers are using predictive analytics and machine learning to better understand consumer behaviour; who buys what and where? These questions can be readily answered with the right predictive models and data sets, helping retailers to plan ahead and stock items based on seasonality and consumer trends – improving ROI significantly. There are eight steps to perform predictive analytics with ML. Step 1: Define the problem statement We begin by understanding and defining the problem statement, and deciding on the required datasets on which to perform predictive analytics. Example: There is a grocery store. Our objective is to predict the sales of groceries for the next six months. Here, past sales data of how many groceries were sold and the resulting profits of the last five years will be the dataset. Step 2: Collect the data Once we know what sort of dataset is needed to perform predictive analytics using machine learning, we gather all the necessary details that constitute the dataset. We need to ensure that the historical data is collected from an authorized source. Using the grocery store example, we can ask the accountant for records of past sales logged in worksheets or billing software. We collect data spanning the past five years. Step 3: Clean the data The raw dataset obtained will have some missing data, redundancies, and errors. Since we cannot train the model for predictive analytics directly with such noisy data, we need to clean it. Known as preprocessing, this step involves refining the dataset by eradicating unnecessary and duplicate data. Step 4: Perform Exploratory Data Analysis (EDA) EDA involves exploring the dataset thoroughly in order to identify trends, discover anomalies, and check assumptions. It summarizes a dataset’s main characteristics. It often uses data visualization techniques. Step 5: Build a predictive model Based on the patterns observed in step 4, we build a predictive statistical machine learning model, trained with the cleaned dataset obtained after step 3. This machine learning algorithm helps us perform predictive analytics to foresee the future of our grocery store business. The model can be implemented using Python, R, or MATLAB. Hypothesis testing Hypothesis testing can be performed using a standard statistical model. It includes two hypotheses, null and alternate. We either reject or fail to reject the null hypothesis. Example: A new ‘buy one, get one free’ scheme is implemented where customers buy a packet of soap and get a face wash for free. Consider the two cases below: Case 1: Despite the scheme, sales of soap did not improve. Case 2: After the scheme, sales of soap improved. If the first case is true, we fail to reject the null hypothesis as there is no improvement. If the second case is true, we reject the null hypothesis. Step 6: Validate the model This is a crucial step wherein we check the efficiency of the model by testing it with unseen input datasets. Depending on the extent to which it makes correct predictions, the model is retrained and evaluated. Step 7: Deploy the model The model is made available for use in a real-world environment by deploying it on a cloud computing platform so that users can utilize it. Here, the model will make predictions on realtime inputs from the users. Step 8: Monitor the model Now that the model is functioning in the real world, we need to verify its performance. Model monitoring refers to examining how the model predicts actual datasets. If any improvement must be made, the dataset is expanded and the model is rebuilt and redeployed. How machine learning improves predictive analytics Predictive analytics continues to be improved with machine learning algorithms. The eight use cases discussed below illustrate how. E-commerce/retail Predictive analytics achieved through machine learning helps retailers understand customers’ preferences. It works by analyzing users’ browsing patterns and how frequently a product is clicked on in a website. For example, when we purchase a t-shirt on an e-commerce site, similar shirts are suggested the next time we log in. Sometimes, we may be recommended several specific items that are often purchased together for x amount of money. Such personalized recommendations help retailers retain customers. Predictive analytics also helps maintain inventory by foreseeing and informing sellers about stock outs. Customer service Customer segmentation is performed based on insights by predictive analytics. Customers are placed into different segments depending on their purchase patterns. For example, book buyers will form one cluster while t-shirt buyers will constitute another. Tailored marketing strategies are then developed for each of the segments depending on their characteristics. Predictive analytics using machine learning can also detect dissatisfied customers and help sellers design products aimed to retain existing customers and attract new ones. Medical diagnosis Machine learning models that are trained on large and varied datasets can study patient symptoms comprehensively to provide faster and more accurate diagnoses. Performing predictive analytics on the reasons behind past hospital readmissions can also improve care. Further, hospitals can use predictive analytics to provide the best care by pre-determining increase of hospital bed availability or staff shortage. For example, if the number of COVID cases for the next month can be predicted and the rise in the number of severely infected can be forecasted, hospitals can make arrangements to deal with such a scenario more efficiently. Sales and marketing Predictive analytics of historical data of customer behavior and market trends can help businesses understand the demands of prospective customers. Companies can achieve higher targets by streamlining their sales and marketing activities into a data-based undertaking. Demand forecasting also helps businesses estimate the demand for certain products in the future. Financial services Predictive analytics using machine learning helps detect fraudulent activities in the financial sector. Fraudulent transactions are identified by training machine learning algorithms with past datasets. The models find risky patterns in these datasets and learn to predict and deter fraud. Cybersecurity Machine learning algorithms can analyze web traffic in real-time. When an unusual pattern is observed, advanced statistical methods of predictive analytics foresee and prevent cyber-attacks. They also automatically collect attack-related data and generate useful reports on a cyber-attack, thereby reducing the need for manpower. Manufacturing Machine learning and predictive analytics help manufacturers monitor machines and notify them when crucial components need to be repaired or replaced. They can also predict market fluctuations, reduce the number of accidents, improve key performance indicators (KPIs), and enhance overall production quality. Human Resource Information Systems (HRIS) Predictive analytics using machine learning identifies employee churn rate and keeps human resources (HR) departments informed of the same. Models can be trained with datasets that have details such as an employee's monthly income, allowances, increments, insurance, and so on. The models learn from past records of ex-employees and find patterns to understand the reasons for leaving. They then predict if new employees are likely to resign or not, empowering HR to minimize the risk.