By: Paul Kenosky Big Data Define Big Data Challenges Increase in Technology Characteristics of Big Data Fraud Detection Social Media Hadoop BigInsight Understanding Big Data Big Data applies to information that cant be processed or analyzed using traditional processes or tools. Wiki Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications Business face big data challenges more and more in today's world They are overloaded with information that can be beneficial to the organization However they do not know how to make use of the raw and unstructured data Interconnectivity: More and more systems, people, and technology are becoming interconnected Inexpensive Integrated circuits are continually becoming cheaper to produce and buy This allows intelligence to be added to many devices that once seemed too costly Example railway cars have hundreds of sensors. Sensors can track things such as conditions experienced by the rail car, the state of individual parts, and GPS based data for shipment With the rise of technology these rail cars are becoming more advanced and sensors are added to sensor data on parts that are prone to wear, so they can be replaced before they fail Data is stored on the rails, railroad crossing sensors, weather patterns that cause rail movements, cargo location, cargo arrival, and cargo departure times Processing all this data using a traditional relational system would be impractical if not impossible Volume: Data being stored today is increasing at an overwhelming number Booking a flight, posting to facebook, sending a text, and more Variety: Represents all types of data Velocity: How quickly data is arriving, stored, and analyzed Transactions Online auctions, insurance claims A big data platform can present opportunities to increases detection success Patterns of fraud can come and go in hours, days, or weeks. If fraud detection pattern has a low latency by the time it is discovered the damage is already done An estimate of 20% of available information that could be useful for fraud detection is being used Why not load the other 80 percent of data into the traditional analytic warehouse? Too expensive Would it not pay for itself? How can we be sure this new information will be valuable before making a costly business decision Use BigInsights to provide an elastic and cost-effective repository to establish what of the remaining 80 percent of the information is useful for fraud modeling. IBM teamed up with a large credit card issuer to improve there fraud detection model. They discovered they could improve the speed of detection and have more accurate results using the new model A process that once took three weeks was improved to just a few hours. They also found that about half of the 80% was actually beneficial information that could be used Organizations can use Big Data usage pattern in social media to find out what is being said about the company and competitors This information can be used to significantly improve decision making IBM has built a solution to accelerate an organization usage called Cognos Consumer Insights (CCI) CCI allows an organization to see what people are saying, how topics are trending in social media, and all sorts of things that affect the organization Although you can find out what people are saying, another more important question would be why are they saying and behaving in this way? An organization needs to look beyond that data to answer the question Sales, promotions, loyalty programs, merchandising mix, competitor actions, and even weather can come into play. Company introduced a different kind of packaging for one of its products. Customers were giving negative feedback on the new packaging Months later the company discovered the problem and switched the packaging to an eco-friendly package. This in turn increased sales and customer happiness An author of the book is a prolific facebook poster Traveling on airlines is essential to his job and after a number of flight delays he posted his frustration with these airlines on his facebook wall These flight delays were found on his facebook wall by the airline and they contacted him Although, it doesn't mention what the airlines to did to compensate or fix the problem it does show one thing which is the company where listening Hadoop is a top level apache project and is open source Is designed to scan through large data sets to produce its results through a highly scalable, distributed batch processing system Data is redundantly stored in multiple places across clusters The programming model is build to expect failures and it will automatically resolve them by running portions of the program on various servers. Hardware components might fail but due to the redundancy hadoop can provide fault tolerance Hadoop can be complex to install, configure, and administrate IBM takes this complexity away with the BigInsight installer BigInsights makes it simpler for people to use Hadoop and build big data applications. It enhances this open source technology to withstand the demands of your enterprise, adding administrative, discovery, development, provisioning, and security features, along with best-in-class analytical capabilities from IBM Research. The result is that you get a more developed and userfriendly solution for complex, large scale analytics. http://www01.ibm.com/software/data/infosphere/biginsight s/index.html http://en.wikipedia.org/wiki/Big_data http://www.decalsplanet.com/item-10485black-pot-of-gold.html http://drshocker.blogspot.com/2007_03_01_arch ive.html http://www.mytinyphone.com/wallpaper/31448/ https://www.facepunch.com/showthread.php?t= 1332655 Short YouTube video that explains Big Data Some interesting stories the speaker went over Bats flying around airports Noise was produced and airports filtered this noise out Weather patterns Airplane movement 15 years later scientists got together Collecting data on bat migration Throwing this data away One mans garbage is another mans treasure Gates foundation Eradicate polio in Nigeria Satellite maps Found villages no one knew of Government did not know these people where there No maps showed these villages Gates gave out GPS phones to polio eradication workers Combining satellites, vaccine, and cell phones is not something that comes to mind when thinking of big data Problems caused by misinformation or get the information to late http://motherboard.vice.com/blog/big-dataexplained-brilliantly-in-one-short-video http://www.netanimations.net/Movingvampire-bat-and-Dracula-blood-suckinganimations.htm http://www.nbcnews.com/id/37086846#.Uxd 7-YXpbYg