More information about Business intelligence. 1. Sources of data A. Source : scanning of data B. Source : manual registration of data Definition: "registration of data" Registration of data is putting data into a computer, in order to memorize the data. Often such a computer is called a server or a database. Scanning and manual registration are only two examples of registration of data, as there are many more ways to register such as the use of camera's (in traffic), sensors (measurement of temperature, ...), ... Who does the registration? o It can be you: when you register on a website, you start registering data about yourself. Age, name, address, ... and the computer keeps track of what you buy. o It can be some other person: at school a staff member enrols you for the programme you choose. 2. Where does the registered data go to? All registered information flows by means of WIFI, cables, ... to a data center. Most customers and staff members don't know where that center is. The ICT staff is responsible for the storage of the data. In our school we have our own ICT staff and our own local servers, but more and more companies store the data in a remote data center, supervised and maintained by a company such as Google, HP, IBM, ... A data center often contains thousands of servers. Below we see some 30 to 100 servers. Inside the data center, data is stored in a database. In the picture above the databases are integrated in the black machines. A database is in fact a program, such as excel or windows, but it's not designed to click on it. The registered data flows in through wires at the back, coming from the outside world (school, bank, shop, ...). The database program automatically stores the incoming data on hard disks (just like the disk in a normal computer, but of higher quality and price). Later, the data in the database can be retrieved when needed, by other programs, by ICT-professionals or by you (if you got the right to access the data). A database works completely autonomous, which means that most of the time, no one is needed in the data center (except for maintenance, security, ...). The data flows in and out of the databases automatically. 1 Inside the database, the data is stored in tables. There is one separate table for each data type. So, there is a separate table for orders, a separate table for customers, ... In each table there is at least one column with numbers (ID's which means "IDentification numbers") that refer to a different table. Below, CustomerID refers to CustID. Flow from data to report In powerpivot, you can use the data of the MODEL to create tables/graphs. The data flows to the Powerpivot "MODEL" in your laptop. (structured storage) You can use Power Query to get the data to your laptop. Sales are stored in the data center sales are registered 2 Name 2 possible sources of data. Describe the way data get into a database. What can be seen in a data center? How do database tables refer to one another? Is a database physically often very close to you? Explain. Is a Powerpivot model often very close to you? Explain. 3. What is "business intelligence" and how is it used? With ICT, several goals can be achieved Most ICT is used to manage day-to-day business. In a shop, a bank, a hospital, a school etc, people make data flow by means of scanning, typing, presenting a credit card.... This kind of ICT is the most common ICT of every day life. It can have all kinds of names, such as "registration", "filing", "data entry", "administration", ... A special kind of ICT is called business intelligence (BI). It refers to all actions, programs and reports that aim at getting a better understanding of the business (business = selling, treating patients, ...). Business intelligence does not deal with one customer/patient/student, but it looks at all customers/patients/students. In a hospital, BI is not about questions like "what is the telephone number of patient X?", but rather about more general questions like "are patients on average staying longer in hospital now, than they did a year ago?" In a school a BI-question could be "in which programmes, is the average number of years to get a diploma, higher than in other programmes?". And what could be the reasons for that? BI-questions typically involve the use of a lot of data (all customers, all students, several years...) In practice, BI is used mainly by directors and their assistants. The BI reports give an insight in how good/bad the business is going. In a small company, a director can see how things are going just by walking around. But in a big company with many facilities (buildings, ...) it is no longer possible to get an overview by walking around. Moreover, in e-commerce, you can walk around in the warehouse, but you can nowhere meet the customers. That is an important reason to create reports that reveal how customers behave: how they use the website, what they buy and when, ... Overview of the differences Classical ICT Business intelligence goal = making every-day-business work. Accomplishing the daily tasks. Goal = getting insight in the business deals with one customer/student/patient at a time. (registering a patient, updating 1 telephone number, closing one sales transaction...) looks at a lot of data at the same time, in order to find an interesting insight. Registering what happens now. Comparing several months, years lot of text, few tables and graphs many tables and graphs used by staff dealing with the every-day-business used by directors and their assistants 3 Absolutely needed to make the business work. This ICT has to be available 24/7. Very useful, but the company can do without for a while. Deep knowledge and wisdom is not needed just to get through the day. Every company has it. In most cases, it's already in place for many years. Companies have it, but some companies are still looking how to get real good insight. BI is always evolving in a company, and each question answered, leads to more interesting questions. Examples of the use of BI SCHOOL: Managing a school on the medium term: "which educational programs are successful and which are not?" Possible action: better targeting of marketing efforts on specific groups of secondary school students. LOGISTICS: 20 to 30% of the (returning) trucks on the road are empty. What a waste! Thus, the management of the logistics company wants to know "what are reasons why the trucks return (half) empty". "How can this be avoided?". Although this is a very tough problem to solve, finding clues can be very rewarding, financially and for the environment. Possible action: use more information to determine what can be loaded on the way back. HOSPITAL: An empty hospital bed does not yield money. Thus, the management wants to know "What is the average occupancy of a hospital bed in department x ?" Possible action: beds/rooms can be used by some other department with a higher need of beds/rooms. (Remark: in many companies, departments don't share much information, which leads to inefficiencies. Consequently, reports that combine data of several departments, can be very revealing.) What kind of people in a company use BI-reports? What are these people looking for in these reports? About a hospital: give an example of the use of classical ICT and the use of BIreports. Same question, for a logistics company. Same question for a school. Which type of ICT can be missed for a day or some days? Why? 4. Definition and importance of measures, such as sales, cost, ... All companies deal with money, and profit is important for the survival of the company. Therefore, reports about profits are created in all companies. In the old days, these reports were handcrafted by zealous accountants, but nowadays, BI-tools make reporting a lot easier. The basics measures are measure sales reference to accounting revenue 4 it as calculated as follows: (example) In this BI-course, the "sales" is in fact the revenue ( which is also equal to net sales). cost margin = sales - cost margin % = (sales - cost) / sales revenue = net sales Cost of Goods Sold (COGS) difference between revenue and cost of goods sold (COGS) Gross margin is the difference between revenue and cost of goods sold (COGS) divided by revenue. As you see, this course is limited to a small part of the income statement. We don't calculate net profit, as we omit important elements such as interest payments on loans, all other expenses (e.g. wages), depreciation, ... These topics are treated in more detail in your course "Business fundamentals". In many exercises, the margin % is calculated, because it is an important indicator of the quality of the company (and of the quality of the management), as this margin % easily shrinks when competition gets fierce. (car. course Business fundamentals, Porter's competitive forces model). Which of these calculations is the correct one? Why? o Margin % = sales - cost / sales o Margin % = (sales - cost / sales ) o Margin % = (sales - cost) / sales Explain the importance of margin %, in relation to competition. 5. What are typical problem related to getting good data A. The data is not available If some important data is not collected by your company (e.g. customer satisfaction), then on the short term you can try to create the data, for instance by conducting a survey among the customers. If the data needed does exist, but the ICT staff doesn't provide you with the data. Then, try to meet these people and ask nicely for the data. Explain them the importance of it and don't forget to offer them lots of coffee, cola and cookies. ;) B. The data provided by the ICT staff is not exactly what I need. On the short term, try to manipulate the data yourself to get it in the right shape. In the mean time, ask the ICT staff whether they can deliver better/other data in the future. Try to be very specific about what date you want, because creating a dataset you can download, takes effort for the ICT staff to create and they don't have much time. So, do you homework (list explicitly what you need), and most importantly, keep the communication channel open to the ICT guys. Go to them, talk to them. Mail and SMS are bad tools to connect to colleagues. Learn how to meet new people, add them to your network. differ 5 C. How to manipulate data? What is data manipulation and how to organize it? If you want to calculate something and get one final result, than you can manipulate (change) the data in any way you want, as long as you stick to the "truth". The techniques you learn in "digital toolkit" are sufficient to get a good result. But, if you create a report, the data of that report will be updated daily, weekly or monthly. It is important to do the manipulations in such a way, that you don't have to redo every manipulation step over and over again. By means of a QUERY, you can show to Powerpivot which manipulations you want to do. These steps are saved in the excel file and will be re-executed every time you refresh/update the source data. Some common issues and the solution for it. issue solution (in a query) duplicate rows remove duplicate rows incorrect data change the value. After a refresh, the same change will be applied again. merged information in one column. E.g. NameFirstnameDateOfBirth split column via an appropriate delimiter. you need some other kind of information. E.g. we need to know how many students there are. But this number depends on what we want to do with the information. The number of students enrolled < > ("is not equal to") the number of students that are enrolled in a complete program <> the number of students that still try to get a diploma (some stopped their study without informing the school administration) <> the maximum number of student to be evacuated in case of a fire. - Look well what the data means you got in the dataset. - Think well about what you want to report. - Manipulate the data accordingly (in a query), to create a datafile that matches the problem you are dealing with. (For instance, counting students that are fully or partially enrolled, can be done by checking if the number of credits of the student enrolment equals 180.) - Check you report. Are the tables and graphs correct? Question: if we create a query and the result is good. Can we be sure the query is really OK? Answer: no, the same query must also work properly when the data is updated. In the future new customers and products will be added. A query should be written in such a way, that it works fine, now and also in the future, with new data. If the data provided does not perfectly fit your needs, what can you do? (short term /medium term) A query that works fine is not always an acceptable query. Why not? 6 6. About models. A. Why do we need a model? Updating data gets easier By creating a model (linked to the corporate database(s)), we can afterwards update (refresh) the data easily. All tables and graphs will update automatically. So, no rework any more. Do the demo exercise "starter kit 4" and experience how refreshingly simple a "refresh" works. Defining a data range is no longer needed Without a model (without Powerpivot, only with the techniques of "Digital toolkit") we have to define a range of data for each table or graph. Each time the data updates, the ranges have to be checked and often need to be changed. Combining tables gets easy Without a model, combining data of different tables (orders - customers - ...) would be quite a job, because this can only be done by means of a formula such as VLOOKUP. One separate VLOOPKUP for each column... this job could take a whole day, whereas powerpivot can do it for you in a blink of an eye, by use of the relations. Thus, for reporting, a tool like Powerpivot is surely recommended. B. How can we get the data into the model As shown in the course, we can connect to the data: by means of a direct connection, using Powerpivot/manage/ some item in "get external data" or by means of query, using Data / some item in "get & transform data". In both cases, the data is physically copied to the background of the workbook, and the workbook stores the path of the source data (e.g. "C:\data sources"). That way, powerpivot can, on your demand, perform a refresh from that same location. When to use a query? Answer: in case data manipulation is needed. 7 C. What is the meaning and the purpose of the relations? In picture on the previous page, we see the relations as lines interconnecting the tables. Excel uses these lines as follows: When you create a pivot table with orders and customers, Powerpivot (PP) looks in the model (below) how these tables are linked. The arrow in the model describes how the tables are linked (which column of the first table, with which column of the second table). By means of that link, PP knows it can get for any order, all the information of the customer of that order. For each order, PP takes the CustomerID (5 in this example), goes to the table Customers and looks for the corresponding number (also 5) in the column CustID. There, all the customer information is found and becomes available for the pivot table. Which part of the customer information appears in the pivot table depends on your selection in the right pane with pivot table fields. D. Common errors in a model + solutions. Error: an inactive relation. The connection is a dotted line. Solution: right click, tick activate. Next, you can use the activated relation or you can delete it, if needed. Error: a link from a number to a text. E.g. from ID to Name. Connect numbers to numbers (that have exactly the same meaning). Such an error often leads to a very small pivot table, with no values or "blank" values. Solution: delete the relation. Create a correct new relation. Error: a link has to be drawn from the inside (where the measures are) to the outside. In this course, you drag away from Orders. If it is drawn from the outside to the inside, numbers in the pivot table can be wrong. Solution: If you think the direction might be wrong, delete the relation and make sure to 8 create a new one from the inside (orders) to the outside. Look in the model below (of exercise 007). To create a pivot table like Excel uses the following tables (make a choice) o Orders o All tables o Orders and Customers To create a pivot table like Excel uses the following tables (make a choice) o Orders o All tables o Orders and Customers o Orders and Products 9 If you have 2 tables, such as Orders and Customers, but you don't have Powerpivot as an Add-in (outdated version of excel). Which formula can you use to combine both tables into one large table? When you create a new relation in the model, in which direction should you drag the line? o That does never matter o From the inside (Orders) to the outside o From the Outside to the inside (Orders) o From the largest to the smallest table 7. What are characteristics of a useful report? It should be readable for the audience. If it will be presented on the intranet of the company, make sure it nicely fits on the screen. If you will present it to your colleagues, make sure that all fonts are large enough. Showing unreadable tables and graphs is a sign of poor preparation or even a lack of respect for the audience. Respect is a virtue worth to pursue in life. So, use rather big fonts. The report should focus the attention of the audience on something you want to show. The tables and graphs are merely tools to support the story you bring. Your story is paramount. Such a story could be for instance: "One year ago, we started our new way of dealing with complaints. As you know, in this new policy we react within 6 hours by mail and we send a replacement product to the customer, without waiting for the returned defective product. Customers are delighted! On this graph (...show it now...) we can see that 90% of the customers who complained about a product, buy significantly more products in the year after the complaint. In 80% of the cases (complaining customers), the sales to the customer grows with more than 20%. This is significantly better than average, as you can see in the second graph. What is a dashboard or a scorecard? A dashboard is a collection of tables and graphs that belong together. Dashboarding is just a trendy name for creating graphs. For a real estate company (buying and selling buildings), a dashboard could look like this: 10 Such a dashboard gives an overview to managers, management assistants and other employees (sales representatives, ...) how business is going. They can see timely new trends (sales going up or going down, ...). A dashboard is a start of thinking, discussing, changing plans, ... A scorecard is a concise (small) dashboard, with important KPI's (Key performance indicators) management wants to keep under control, e.g. profit, number of new customers, number of lost customers, ... Often, the managers predefine target values. These target values are where the green turns into yellow. When a target is met, the indicator is green, otherwise it is yellow or red. A scorecard scores the performance of the business. In each part of the scorecard, there is clear indication whether the target is met or not. Managers often use dashboards as a starting point for further discussion about what is going on (good/bad) in the company. After discussion, the needed action is taken. What do dashboards and scorecards look like? Give some examples of what can be seen on it. 8. How to deal with the formulas to create new measures? In this course, only 3 types of formulas were used: TOTALYTD([Sum of Sales];Calendar[Date]) Calculate([Sum of Sales]; PREVIOUSYEAR('Calendar'[Date])) Calculate([Sum of Sales]; PARALLELPERIOD('Calendar'[Date];-12;Month)) How to decide which one to use? Well, PARALLELPERIOD is a very convenient formula, because it can be combined with any time line or time filter. In most cases a period in the recent past has to be compared with a similar period further in the past. With "parallelperiod" such a comparison can be done, for any length of period. 11 The formula PREVIOUSYEAR always takes the previous year, from 1/1 to 31/12. PREVIOUSYEAR always shows a whole year (12 months). TotalYTD selects a part of a year, a part starting from 1/1. If the targeted period is May 2008, TotalYTD selects 1/1/2008-31/05/2008. So, it selects a part of a year. Any of these 3 approaches can be useful in some specific situation, but parallelperiod surely is my favourite! Question: fill in what belongs in the yellow zone. (the answers are coloured white. First answer, then look by changing the colour) calculate ( [Sum of Sales]; Parallelperiod(datetable[Date]; -12; month ) => sum of sales of period "May 2007" TotalYTD ( [Sum of Sales]; (datetable[Date]) ) => sum of sales of period "1th Januari 2008 - 31 May 2008" calculate ( [Sum of Sales]; PreviousYear(datetable[Date]) => sum of sales of period "1th Januari 2007 - 31 December 2007" It is important to know that these formulas can only work properly when a time line or time filter has been used. The formulas need a targeted period to start from. In the example above, May 2008 is the targeted period and the formulas start from there. On the next row, the targeted period is June 2008 and so on. 12