Running head: Six Sigma in Big Data 1 Implementing Six Sigma in Big Data – Training Program for Technical Consultant at PwC Srinivas Pochincharla Dr. Priscilla Berry University of Florida Implementing Six Sigma in Big Data – Training Program for SIX SIGMA IN BIG DATA 2 Technical Consultant at PwC Authors: Srinivas Pochincharla, 401 East Las Olas Boulevard, Suite 1800, Fort Lauderdale, Florida 33301 300 Madison Avenue #24, New York, NY-10017 SIX SIGMA IN BIG DATA 3 Contents Executive Summary ............................................................................................... 3 Introduction .......................................................................................................... 5 What is Big Data ................................................................................................ 5 Current Difficulties ............................................................................................ 6 Solution................................................................................................................. 7 What is six sigma ............................................................................................... 7 Implementation of Six Sigma methodologies in Big Data .................................... 8 Conclusion .......................................................................................................... 10 Reference ............................................................................................................ 11 Executive Summary 300 Madison Avenue #24, New York, NY-10017 SIX SIGMA IN BIG DATA 4 As a consulting and advisory firm, Price Waterhouse Coopers (PwC) currently provides data assurance solutions to clients. With the advent of big data, PwC is also venturing into the field of predictive analytics; analyzing gigantic amounts of data using different complex techniques ranging from NOSQL databases to proprietary solutions, such as SAS. Big data is an abstract ideology, where extraction, analyzing and sorting the data can help an organization predict the future trends and achieve profitability in a highly competitive market. The problem with big data lies with the uncertainty associated and one of the many challenges involve the extraction process to be time efficient and error free. By using the Six Sigma process, the process can be enhanced efficiently. Six Sigma is an intricate process where the organization meticulously observes and mitigates the errors and deviations occurring in its operations by applying rules and strategies. Implementation of Six Sigma has resulted in an estimated savings of $427 billion for the Fortune 500 companies (Marx, 2007). Through combining the elements of six sigma and the predictive analytics concepts of big data, PwC can minimize the uncertainty associated with the data and streamline the process. In big data, categorization is difficult. Therefore, using the process of Six Sigma will make categorization easier, as Six Sigma is more statistical in concept. Furthermore, using Six Sigma will also result in better time cycle, as time management for the teams working on the data extraction will improve, thus providing a critical competitive edge to the firm. Implementing Six Sigma through the teams working on big data projects, will result in higher client satisfaction, thus increasing revenue for PwC. 300 Madison Avenue #24, New York, NY-10017 SIX SIGMA IN BIG DATA 5 Introduction What is Big Data The amount of data in our world has been expanding, and analyzing large data sets —— socalled big data —— will become a key component of competition, underpinning new waves of productivity growth, innovation, and consumer surplus. Big Data is currently a $53.4 billion industry and is growing exponentially, as shown in Figure 1 (Kelly, 2014). Big data is usually referred to as large amounts of data. Having a large chunk of data is useless unless some information is extracted from it. Not only does the extraction have to be meaningful but it also has to be rapid. Extraction of data usually depends on three factors: 1) Volume, how big the data is; 2) Velocity, how fast the data is growing; 3) Variety, what types of data are in the sample collected. Figure 1 300 Madison Avenue #24, New York, NY-10017 SIX SIGMA IN BIG DATA 6 An excellent example is the retail market chain Target. Using data analytics on its customers and by tracking what they are purchasing, the retail giant is able to predict what the customers are planning to buy next and consequently send them advertisements related to the product. The prediction is very accurate. For instance, there was a famous incident where a gentleman asked Target’s customer service to stop mailing him coupons related to pregnancy. He came to find out later that his daughter was pregnant and Target was mailing him the coupons by predicting the purchase history occurring under their household account (Goswami, 2014). Current Difficulties The significant problem associated with Big Data is being able to relate it. Since the data is in large volumes and is spontaneous, being unable to relate the data causes problems with the speed of extraction of meaningful data as shown in Figure 2 (Taleb, 2013). The data has to be analyzed thoroughly by professionals, and special statistics are used on the data to approximate its meaning. Figure 2 300 Madison Avenue #24, New York, NY-10017 SIX SIGMA IN BIG DATA 7 PwC specializes in this field, where the risk assurance branch essentially has a data assurance department, which studies large amounts of data for the clients. Using different software tools and providing various control checks, the data is extracted for the clients and its accuracy is insured. However, the process is time consuming, because it requires effectively-managed teams and emphasis on client priorities. Another problem associated with Big Data, are the security issues. The most recent data breach occurred with Target and Sony when the customers’ private information was compromised. Security is becoming an essential element in driving customer satisfaction for companies. PwC provides IT security services that investigate the companies’ security loopholes and that make the data more secure by performing analyses. Solution What is six sigma Six Sigma improves the quality of process outputs by identifying and removing the causes of defects (errors) in business processes. Six Sigma (although it seems like a technical component) is applicable to all kinds of industries and companies. It assists users in developing minimal error products, which also enhance and improve the efficiency of the process involved. Six sigma follows two project improvement methodologies-DMIAC (Define, Measure, Analyze, Improve and Control) and DMADV (Define, Measure, Analyze, Design and 300 Madison Avenue #24, New York, NY-10017 SIX SIGMA IN BIG DATA 8 Verify), and each phase is composed of five different phases. Companies usually start to implement the DMAIC methodology later if the organization culture permits DMADV to be added to it. Only DMAIC methodology will be dealt with, in the current article. Implementation of Six Sigma methodologies in Big Data Big data is becoming fundamental to the future of business. Six Sigma essentially is statistics, as is big data. Processes, organization structures and metrics were all designed to support the “zero defects” philosophy of Six Sigma. Utilization of the Six Sigma process can effectively diminish human error problems (Goswami, 2014). Six Sigma can effectively be utilized to provide big data solutions through the five phase (DMAIC) process: Figure 3 In the define phase, the voice of the customer (VOC), which translates all customers’ core needs into technical requirements can process intangible into a tangible/usable form. The VOC is crucial because it is known as (CTQs) critical quality measures. This process is 300 Madison Avenue #24, New York, NY-10017 SIX SIGMA IN BIG DATA 9 essentially important in the consulting industry, because all the other processes are dependent upon this phase. By critically understanding customer needs through applying the Six Sigma process, chances of errors in correlating the data can be significantly reduced. The failure modes effects analysis (FMEA) in the measuring phase can analyze the potential failure modes for each of the measured fields. Executives use the feedback from FMEA to predict disruptions and allow anticipated actions. This mode is very critical to the speed of extraction of data, because any bottlenecks relating to data crunching can hinder the process of extracting it efficiently and, most importantly, to extract it quickly. The third phase (Analysis) uses data and decomposes the collected statistics to offer practical solutions for the problems at hand. Experiment in this phase is a tool that effectively and efficiently analyzes the cause-and-effect relationship between the measured fields and the CTQ’s. The improve phase identifies the variations and develops control charts by simulating the changes in data flow. These charts can be used for real-time monitoring (Hartwig, 2012). The control phase in Six Sigma monitors the variability in the changed system. The control phase is critical in the sense that any vulnerabilities related to the data need to be exposed in real time. This process is extremely difficult considering the volume of the data, and implementation of Six Sigma methodology in this phase can essentially act as a safeguard for the data at hand. Any data breach that occurs, if detected in real time, can help companies employ better control schemes. 300 Madison Avenue #24, New York, NY-10017 SIX SIGMA IN BIG DATA 10 Conclusion With the advent of big data, industries are moving forward in a competitive environment where predicting the future through historical analysis will prove to be a major game changer. However, the technology is fairly new, considering that Web 2.0, where users can actually interact over the Internet, was conceived in the last decade. Vast amounts of data have to be managed effectively, and to do that, research and progress are proceeding at a brisk space, where new fields of study such as predictive analytics, visual analysis and, information systems are helping to define the future. Six Sigma has proven to be a very effective project management solution in various Fortune 500 industries. In fact, major firms consider compensating employees, if they are Six Sigmacertified associates. The statistical improvements made with Six Sigma can prove to be extremely critical for the future of big data. Not only can the analysis of the data be improved with the application of Six Sigma, but collection of data and, most importantly, reduced time cycles in extraction of data can prove to be the critical edge that industries require in this competitive environment. As a consulting firm for whom process improvement and risk assurance are major components of revenue generation, it is extremely important for PwC to apply older concepts to recent advancements to give our firm a competitive edge in the consulting world. Especially, since the competing firms are already forming dedicated departments related to the issue of big data (EY, 2014). The timing is of critical essence for innovation and progress in the related field or PwC may risk losing clients to competition. 300 Madison Avenue #24, New York, NY-10017 SIX SIGMA IN BIG DATA 11 Reference Marx, M. (2007, Jan 11). Six sigma saves the fortune 500 $427 billion. . Retrieved from http://www.isixsigma.com/community/blogs/six-sigmasaves-fortune-500-427-billion/ Kelly, J. (2014, Feb 12). Big data vendor revenue and market forecast 2013-2017. Retrieved from http://wikibon.org/wiki/v/Big_Data_Vendor_Revenue_and_Market_Forecast_201 3-2017 Goswami, B. (2014, Feb 14). Why six sigma learnings are relevant for big data. Retrieved from http://insights-onbusiness.com/electronics/why-six-sigma-learnings-are-relevant-for-big-data/ Taleb, N. (2013, Feb 8). Beware the big error of ‘big data’. Retrieved from http://www.wired.com/2013/02/big-data-means-big-errors-people/ Dmiac vs dmadv. (n.d.). Retrieved from http://www.isixsigma.com/new-to-six-sigma/design-for-six-sigma-dfss/dmaicversus-dmadv/ Six sigma dmadv methodologies. (n.d.). Retrieved from http://www.villanovau.com/six-sigma-methodology-dmadv/ Hartwig, C. (2012, Apr 10). The parallels between big data and the advent of six sigma. Retrieved from http://www.katoka.com.au/2012/04/big-data-and-six-sigma/ EY (2014, Apr 2). Corporate website detailing service offerings related to big data. Retrieved from http://www.ey.com/US/en/Services/Advisory/IT 300 Madison Avenue #24, New York, NY-10017