Final Review and Study Guide MIS2502, Spring 2011 Section 03 1. BI Mental Map Competitive Advantage Performance Better Understanding Good Business Decision Data Mining External Source Data Warehouse Product No. Product Name Price MySQL ERD Customer No. Name Address Membership Description Exam • Be able to express BI mental map in your own words. • Fully understand the role, usage, importance of ERD, SQL, and Data mining in this map. 2. OLAP • OLAP (On-Line Analytical Processing) – Multidimensional data analysis techniques – Why particular business events have occurred and or forecast what may occur in the future • Slice, Dice, Pivot, and Drill Down/UP • Questions expected on transaction data from operational system – Who purchased a particular product? – How much did an employee get paid? – How many of a product was manufactured? VS. • Questions expected on OLAP – What are the total sales for each product? – What are the total sales for each department? – Which salesperson has sold the most? – Which products does each salesperson sell the most of? – In which month did most of the sales occur? Multidimensional View of Sales • Multidimensional analysis involves viewing data simultaneously categorized along potentially many dimensions Exam • The role of OLAP • What kind of questions that OLAP can address. • Be able to recall our pivoting practice • For a given pivoting result table, be able to explain the results in your own words. 3. Data Mining • Seeks to discover patterns or relationships within the data • Data mining tools automatically search data for patterns and relationships • Data mining tools – – – – – Analyze data Uncover problems or opportunities Form computer models based on findings Predict business behavior with models Require minimal end-user intervention Data Mining Tools Exam • Know what data mining can do • Be able to differentiate data mining tasks from other tasks. – E.g. : computing the total sales of a company. • Q: is this a data mining task? • A: No. This is a simple accounting task. 3.1. Data Exploration • This is the first step of any data analysis task. – Data understanding – Data validating • You should be familiar with some basic data exploration techniques: – – – – – Descriptive analysis (mean, max, min…) Histogram Plot Pie … 3.2 Clustering and Segmentation • Unsupervised classification: grouping of cases based on similarities in input values. • For exam – You should be able to differentiate this technique from others. – You should be able to define the logic behind setting the number of groups. This is a subjective decision, but the number of groups shouldn’t be higher than 10 in most cases. E.g. • Three versions: Windows 7 Home, Windows 7 Pro, Windows 7 Enterprise Vs. Only one version: Mac OS X – Segmentation: help us to profile the clusters. You need be able to describe the features of each group based on segmentation results. 3.3. Association Rules • A legend: Beer and Nappies – Men who have children and who (have to) do the shopping on Saturdays often tend to buy nappies for their little ones besides the beer for the weekend evenings in front of the television. Subsequently, the superstore decided to position the palettes of beer besides those of nappies on Saturdays - with the success of strongly risen sales figures. • For Exam: – Based on given result, tell what’s the best rule, and how to explain this rule in your own words. – Whether the rule is symmetric is also important. 3.4. Regression • By modeling the variances, to find out the relationships between our interest and all independent factors. • For exam – How to explain R square? • A critical criterion for model performance. 0<R square <1. It represents the percentage of variance that a model can explain. – How to explain P value? • A critical criterion for factor influence. P value is always positive and less than 1. When p value of a factor is smaller than 0.1, we say this factor is significantly influential to our interest. – How to explain Estimate Coefficient? • You only need to consider tis estimate coefficient when according p value is less than 0.1. In explanation, every unit increase of this factor will result an [estimate coefficient increase] of our interest. – All factors that use to estimate our interest should be independent! 4. Three Levels of Strategies • To finish any task or solve a problem, you have three levels of strategies: – Manage to do it by yourself – Hire someone to do it for you – Design a mechanism and have a group of people to do it, probably FREE. • Collective intelligence falls in the third level and shows critical important in IT business success. Example • TASK – to capture more customers. Design your three types of strategies: – 1. direct marketing by yourself. • E.g., Individual job seeking – 2. hire salesperson or marketing agent. • E.g., most companies – 3. viral marketing mechanism, word-of-mouth, affiliate program. • E.g., Facebook friend invite, Ponzi scheme, Hotmail footer, Google word-of-mouth • For exam, you should be able to design three levels of mechanisms for a specific task. • In the future, for any recurring tasks, please try your best to design the third level of strategy. If you can make it, you can get success in your area. This is not a dream or legend, but the smartest choice very few people considered. Collective Intelligence • Explain why and why not collective intelligence lead to better decisions. • Describe the key issues in implementing collective intelligence. • Provide an example why managers need to consider many key issues when designing collective intelligence tool – from loss of control to the balance of diversity. Resume Tips • Entry level jobs: – Business Analyst or Data Analyst. (SQL and analysis techniques. Major in finance is a strong plus for this position) – Marketing Analyst. (SQL, cluster and segmentation, association rules, regression. Major in marketing is a strong plus) – Database Management. (SQL, ERD. Major in MIS is a strong plus) • Besides, this course provides very good knowledge base for future – Skills: PHP, Java, Use Case – Positions: product manager, project manager, database developer • Skill Bullets: – Business intelligence and data analytics: ERD, SQL, OLAP, data mining (clustering, segmentation, association rules, basket analysis, decision trees, regression)