BIG DATA What is Big Data? ❑'Extremely large collections of data (data sets) that may be analysed to reveal patterns, trends, and associations, especially relating to human behaviour and interactions.’ ❑the data sets are so large that conventional methods of storing and processing the data will not work. Characteristics of Big Data (5Vs) ❑ Volume -Volume refers to the huge amounts of data that is collected and generated every second in large organizations. This data is generated from different sources such as IoT devices, social media, videos, financial transactions, and customer logs. ❑ Variety - refers to the different sources of data and their nature. data can be both structured and unstructured. Structured data: this data is stored within defined fields (numerical, text, date etc) often with defined lengths, within a defined record, in a file of similar records. An example of structured data is found in banking systems, which record the receipts and payments from your current account: date, amount, receipt/payment, short explanations such as payee or source of the money. Unstructured data: refers to information that does not have a pre-defined data-model. It comes in all shapes and sizes and it is this variety and irregularity which makes it difficult to store in a way that will allow it to be analyzed, searched or otherwise used. Examples are photos, audio files, videos, text files, and PDFs. Characteristics of Big Data (5Vs) ❑Velocity - refers to the speed at which the data is created or generated. This speed of data producing is also related to how fast this data is going to be processed. This is because only after analysis and processing, the data can meet the demands of the clients/users. ❑ Veracity -means accuracy and truthfulness and relates to the quality of the data. defines the degree of trustworthiness of the data. As most of the data you encounter is unstructured, it is important to filter out the unnecessary information and use the rest for processing. ❑ Value – data is reliable and useful and result in adding value to the company. An example of how data analysis was used by British supermarket group Tesco to add value (refer to the Big Data 1 Article) Processing and Analysing big data (known as Big Data Analytics) ❑ Data mining: analysing data to identify patterns and establish relationships such as associations (where several events are connected), sequences (where one event leads to another) and correlations. ❑ Predictive analytics: a type of data mining which aims to predict future events. For example, the chance of someone being persuaded to upgrade a flight. ❑ Text analytics: scanning text such as emails and word processing documents to extract useful information. It could simply be looking for key-words that indicate an interest in a product or place. ❑ Voice analytics: as above but with audio. ❑ Statistical analytics: used to identify trends, correlations and changes in behaviour These analytical findings can lead to: ▪ Better marketing ▪ Better customer service and relationship management ▪ Increased customer loyalty ▪ Increased competitive strength ▪ Increased operational efficiency ▪ The discovery of new sources of revenue. The Big Data (DIKW) Pyramid ❑ also known as the knowledge pyramid became well known in 1989 from the work of Askoff. ❑ With the emergence of big data, the pyramid has also become known as the big data pyramid The Big Data (DIKW) Pyramid Jennifer Rowley in 2007 explained the relationships between data, information, knowledge and wisdom. ❑ Data: a range of data can be collected from various sources – this is raw data and not particularly useful in this form. ❑ Information: The raw data can be analysed to look for trends or patterns, for example it may appear that there is a link between the purchase of a particular product and a particular group of customers. This is information. ❑ Knowledge: The information can be analysed further to establish how the identified links are connected. Knowing the details of exactly what types of customers buy a particular product or favour particular product features is knowledge. ❑ Wisdom: The knowledge gathered can be used to make informed business decisions. Big data is relevant to performance management in the following ways: ❑ Gaining insights (eg about customers’ preferences) which can then be used to improve marketing and sales, thus increasing profits and shareholders’ wealth. ❑ Forecasting better (eg customer’s future spending patterns, when machines will need replacing) so that more appropriate decisions can be made. ❑ Automating of high level business processes (eg lawyers scanning documents) which can lead to organisations becoming more efficient. ❑ Providing more detailed and up to date performance measurement. Some potential dangers and drawbacks of Big Data: ❑ Cost: It is expensive to establish the hardware and analytical software needed, though these costs are continually falling. ❑ Regulation: Some countries and cultures worry about the amount of information that is being collected and have passed laws governing its collection, storage and use. Breaking a law can have serious reputational and punitive consequences. ❑ Loss and theft of data: Apart from the consequences arising from regulatory breaches as mentioned above, companies might find themselves open to civil legal action if data were stolen and individuals suffered as a consequence. ❑ Incorrect data: If the data held is incorrect or out of date incorrect conclusions are likely. Even if the data is correct, some correlations might be spurious leading to false positive results. Examples of companies using big data (Big Data Article 2) ▪ Amazon - The world’s leading e-retailer collects huge amounts of information about customers’ preferences and habits which allow it to market very accurately to each customer. It routinely makes recommendations to customers based on products previously purchased. ▪ Airlines ▪ Target ▪ Walmart’s Polaris search engine ▪ Beredynamic - manufacturer of high quality audio products such as microphones and headphones ▪ Morton’s Steak House New A1e: Controls over confidential information • Encrypt sensitive files. Encryption is a process that renders data unreadable to anyone except those who have the appropriate password or key. By encrypting sensitive files (by using file passwords, for example), you can protect them from being read or used by those who are not entitled to do either. • Manage data access. Controlling confidentiality is, in large part, about controlling who has access to data. Ensuring that access is only authorized and granted to those who have a "need to know" goes a long way in limiting unnecessary exposure. Users should also authenticate their access with strong passwords and, where practical, twofactor authentication. Periodically review access lists and promptly revoke access when it is no longer necessary. New A1e: Controls over confidential information • Physically secure devices and paper documents. Controlling access to data includes controlling access of all kinds, both digital and physical. Protect devices and paper documents from misuse or theft by storing them in locked areas. Never leave devices or sensitive documents unattented in public locations. • Securely dispose of data, devices, and paper records. When data is no longer necessary for University-related purposes, it must be disposed of appropriately. • Sensitive data, such as Social Security numbers, must be securely erased to ensure that it cannot be recovered and misused. • Devices that were used for University-related purposes or that were otherwise used to store sensitive information should be destroyed or securely erased to ensure that their previous contents cannot be recovered and misused. • Paper documents containing sensitive information should be shredded rather than dumped into trash or recycling bins. New A1e: Controls over confidential information • Manage data acquisition. When collecting sensitive data, be conscious of how much data is actually needed and carefully consider privacy and confidentiality in the acquisition process. Avoid acquiring sensitive data unless absolutely necessary; one of the best ways to reduce confidentiality risk is to reduce the amount of sensitive data being collected in the first place. • Manage data utilization. Confidentiality risk can be further reduced by using sensitive data only as approved and as necessary. Misusing sensitive data violates the privacy and confidentiality of that data and of the individuals or groups the data represents. • Manage devices. Computer management is a broad topic that includes many essential security practices. By protecting devices, you can also protect the data they contain. Follow basic cybersecurity hygiene by using anti-virus software, routinely patching software, whitelisting applications, using device passcodes, suspending inactive sessions, enabling firewalls, and using wholedisk encryption.