Data Analytics: Introduction to Concepts and Data Types

Data Analytics • Data analytics (DA) is the process of examining data sets in order to find trends and draw conclusions about the information they contain. • Data analytics is done with the aid of specialized systems and software. • Data analytics predominantly refers to an assortment of applications, from basic business intelligence (BI), reporting and online analytical processing (OLAP) to various forms of advanced analytics. • It's similar in nature to business analytics. • Data analytics initiatives can help businesses increase revenue, improve operational efficiency, optimize marketing campaigns and bolster customer service efforts. Analytics also enable organizations to respond quickly to emerging market trends Why Data Analytics? Data Analytics Tools Data Collection • In the process of big data analysis, “Data collection” is the initial step before starting to analyze the patterns or useful information in data. • The data which is to be analyzed must be collected from different valid sources. • The data which is collected is known as raw data which is not useful as it is but on cleaning and utilizing that data for analysis further forms information, the information obtained is known as “knowledge”. • The main goal of data collection is to collect information-rich data. Data could be… 1. RDBMS: A relational database is a collection of tables, each of which is assigned a unique name. Each table consists of a set of attributes (columns or fields) and usually stores a large set of tuples (records or rows). Each tuple in a relational table represents an object identified by a unique key and described by a set of attribute values. Data could be… 2. Data Warehouses: A data warehouse is a repository of information collected from multiple sources, stored under a unified schema, and that usually resides at a single site. Data warehouses are constructed via a process of data cleaning, data integration, data transformation, data loading, and periodic data refreshing. Data could be… 3. Transactional Databases: In general, a transactional database consists of a file where each record represents a transaction. A transaction typically includes a unique transaction identity number (trans ID) and a list of the items making up the transaction (such as items purchased in a store). 4. Object-Relational Databases: Object-relational databases are constructed based on an object-relational data model. This model extends the relational model by providing a rich data type for handling complex objects and object orientation. the object-relational data model inherits the essential concepts of object-oriented databases, where, in general terms, each entity is considered as an object. Data could be… 5. Temporal Databases, Sequence Databases, and Time-Series Databases A temporal database typically stores relational data that include time-related attributes. These attributes may involve several timestamps, each having different semantics. A sequence database stores sequences of ordered events, with or without a concrete notion of time. Examples include customer shopping sequences, Web click streams, and biological sequences. A time-series database stores sequences of values or events obtained over repeated measurements of time (e.g., hourly, daily, weekly). Examples include data collected from the stock exchange, inventory control, and the observation of natural phenomena (like temperature and wind). Data could be… 6. Spatial Databases and Spatiotemporal Databases Spatial databases contain spatial-related information. Examples include geographic(map) databases, very large-scale integration (VLSI) or computed-aided design databases, and medical and satellite image databases. A spatial database that stores spatial objects that change with time is called a Spatiotemporal database, from which interesting information can be mined. For example, we may be able to group the trends of moving objects and identify some strangely moving vehicles, or distinguish a bioterrorist attack from a normal outbreak of the flu based on the geographic spread of a disease with time. Data could be… 7. Text Databases and Multimedia Databases Text databases are databases that contain word descriptions for objects. These word descriptions are usually not simple keywords but rather long sentences or paragraphs, such as product specifications, error or bug reports, warning messages, summary reports, notes, or other documents. Multimedia databases store image, audio, and video data. They are used in applications such as picture content-based retrieval, voice-mail systems, video-on-demand systems, the World Wide Web, and speech-based user interfaces that recognize spoken commands. Multimedia databases must support large objects, because data objects such as video can require gigabytes of storage. Data could be… 8. Heterogeneous Databases and Legacy Databases A heterogeneous database consists of a set of interconnected, autonomous component databases. The components communicate in order to exchange information and answer queries. Objects in one component database may differ greatly from objects in other component databases, making it difficult to assimilate their semantics into the overall heterogeneous database. A legacy database is a group of heterogeneous databases that combines different kinds of data systems, such as relational or object-oriented databases, hierarchical databases, network databases, spreadsheets, multimedia databases, or file systems Data could be… 9. Data Streams Many applications involve the generation and analysis of a new kind of data, called stream data, where data flow in and out of an observation platform (or window) dynamically. Such data streams have the following unique features: huge or possibly infinite volume, dynamically changing, flowing in and out in a fixed order, allowing only one or a small number of scans, and demanding fast (often real-time) response time. Typical examples of data streams include various kinds of scientific and engineering data, time-series data, and data produced in other dynamic environments, such as power supply, network traffic, stock exchange, telecommunications, Web click streams, video surveillance, and weather or environment monitoring. Data could be… 10. The World Wide Web The World Wide Web and its associated distributed information services, where data objects are linked together to facilitate interactive access. Users seeking information of interest traverse from one object via links to another. Such systems provide ample opportunities and challenges for data mining. For example, understanding user access patterns will not only help improve system design (by providing efficient access between highly correlated objects), but also leads to better marketing decisions (e.g., by placing advertisements in frequently visited documents, or by providing better customer/user classification and behavior analysis). Data Collection • Most of the data collected are of two types• Qualitative data: Data that is represented either in a verbal or narrative format is qualitative data. A simple way to look at qualitative data is to think of qualitative data in the form of words. These types of data are collected through focus groups surveys, interviews, opened ended questionnaires, observations. • Quantitative data: Quantitative data is data that is expressed in numerical terms, in which the numeric values could be large or small. Numerical values may correspond to a specific category or label. These types of data are collected through Surveys and questionnaires, Analytics tools, Environmental sensors, Manipulation of pre-existing quantitative data. Nominal Data • These are the set of values that don’t possess a natural ordering. • Ex.-The color of a smart phone can be considered as a nominal data type as we can’t compare one color with others. It is not possible to state that ‘Red’ is greater than ‘Blue’. • The gender of a person is another one where we can’t differentiate between male, female, or others. • Mobile phone categories whether it is midrange, budget segment, or premium smart phone is also nominal data type. • Nominal data types in statistics are not quantifiable and cannot be measured through numerical units. Nominal types of statistical data are valuable while conducting qualitative research as it extends freedom of opinion to subjects Ordinal Data • These types of values have a natural ordering while maintaining their class of values. • If we consider the size of a clothing brand then we can easily sort them according to their name tag in the order of small < medium < large. • The grading system while marking candidates in a test can also be considered as an ordinal data type where A+ is definitely better than B grade. • These categories help us deciding which encoding strategy can be applied to which type of data. • Data encoding for Qualitative data is important because machine learning models can’t handle these values directly and needed to be converted to numerical types as the models are mathematical in nature. • For nominal data type where there is no comparison among the categories, one-hot encoding can be applied which is similar to binary coding considering there are in less number and for the ordinal data type, label encoding can be applied which is a form of integer encoding. Discrete Data • The numerical values which fall under are integers or whole numbers are placed under this category. • The number of speakers in the phone, cameras, cores in the processor, the number of sims supported all these are some of the examples of the discrete data type. • Discrete data types in statistics cannot be measured – it can only be counted as the objects included in discrete data have a fixed value. • The value can be represented in decimal, but it has to be whole. • Discrete data is often identified through charts, including bar charts, pie charts, and tally charts. Continuous Data • The fractional numbers are considered as continuous values. • These can take the form of the operating frequency of the processors, the android version of the phone, wifi frequency, temperature of the cores, and so on. • Unlike discrete data types of data in research, with a whole and fixed value, continuous data can break down into smaller pieces and can take any value. • For example, volatile values such as temperature and the weight of a human can be included in the continuous value. • Continuous types of statistical data are represented using a graph that easily reflects value fluctuation by the highs and lows of the line through a certain period of time. Data Collection Primary data: The data which is Raw, original, and extracted directly from the official sources is known as primary data. This type of data is collected directly by performing techniques such as questionnaires, interviews, and surveys. The data collected must be according to the demand and requirements of the target audience on which analysis is performed. Few methods of collecting primary data: 1.Interview method 2.Survey method 3.Observation method 4.Experimental method: CRD- Completely Randomized design RBD- Randomized Block Design LSD – Latin Square Design FD- Factorial design Secondary data: Secondary data is the data which has already been collected and reused again for some valid purpose. This type of data is previously recorded from primary data and it has two types of sources named internal source and external source. 1. Internal source: These types of data can easily be found within the organization such as market record, a sales record, transactions, customer data, accounting resources, etc. The cost and time consumption is less in obtaining internal sources. 2. External source: The data which can’t be found at internal organizations and can be gained through external third party resources is external source data. The cost and time consumption is more because this contains a huge amount of data. Examples of external sources are Government publications, news publications, Registrar General of India, planning commission, international labor bureau, syndicate services, and other nongovernmental publications. Secondary data: 3. Other sources: • Sensors data: With the advancement of IoT devices, the sensors of these devices collect data which can be used for sensor data analytics to track the performance and usage of products. • Satellites data: Satellites collect a lot of images and data in terabytes on daily basis through surveillance cameras which can be used to collect useful information. • Web traffic: Due to fast and cheap internet facilities many formats of data which is uploaded by users on different platforms can be predicted and collected with their permission for data analysis. The search engines also provide their data through keywords and queries searched mostly. Types of Data Types of Data Types of Data Characteristics of Data • Data quality is crucial – it assesses whether information can serve its purpose in a particular context (such as data analysis). • So, to determine the quality of a given set of information, there are data quality characteristics of which one should be aware. • There are five traits namely: • Accuracy • Completeness • Reliability • Relevance • Timeliness Characteristics of Data Characteristics of Data • Accuracy: This data quality characteristic means that information is correct. Accuracy is a crucial data quality characteristic because inaccurate information can cause significant problems with severe consequences. • Completeness: “Completeness” refers to how comprehensive the information is. When looking at data completeness, think about whether all of the data you need is available. Ex- You might need a customer’s first and last name, but the middle initial may be optional. • Reliability: Reliability means that a piece of information doesn’t contradict another piece of information in a different source or system. Ex.- if a patient’s birthday is January 1, 1970 in one system, yet it’s June 13, 1973 in another, the information is unreliable. Reliability is a vital data quality characteristic. When pieces of information contradict themselves, you can’t trust the data and this could result in damages. Characteristics of Data • Relevance: When you’re looking at data quality characteristics, relevance comes into play because there has to be a good reason as to why you’re collecting this information in the first place. You must consider whether you really need this information, or whether you’re collecting it just for the sake of it. If you’re gathering irrelevant information, you’re wasting time as well as money. Your analyses won’t be as valuable. • Timeliness: Timeliness, as the name implies, refers to how up to date information is. If it was gathered in the past hour, then it’s timely – unless new information has come in that renders previous information useless. Timeliness is an important data quality characteristic – out-of-date information costs companies time and money Introduction to Big Data • Big data is a term that describes large, hard-to-manage volumes of data – both structured and unstructured – that inundate businesses on a day-to-day basis. • It is the data that contains greater variety, arriving in increasing volumes and with more velocity. This is also known as the three Vs. • Big data is larger, more complex data sets, especially from new data sources. These data sets are so voluminous that traditional data processing software just can’t manage them. • But it’s not just the type or amount of data that’s important, it’s what organizations do with the data that matters. Big data can be analyzed for insights that improve decisions and give confidence for making strategic business moves. Some Facts & Figures Insights Sources: People, machine, organization: Ubiquitous computing. More people carrying data-generating devices (Mobile phones with facebook, GPS, Cameras, etc.) Introduction to Big Data • 3 ‘V’s of Big Data – Variety, Velocity, and Volume. a) Variety: Variety of Big Data refers to structured, unstructured, and semi-structured data that is gathered from multiple sources. While in the past, data could only be collected from spreadsheets and databases, today data comes in an array of forms such as emails, PDFs, photos, videos, audios and so much more. Variety is one of the important characteristics of big data. b) Velocity: Velocity essentially refers to the speed at which data is being created in realtime. In a broader prospect, it comprises the rate of change, linking of incoming data sets at varying speeds, and activity bursts. Introduction to Big Data c) Volume: It indicates huge ‘volumes’ of data that is being generated on a daily basis from various sources like social media platforms, business processes, machines, networks, human interactions, etc. Such a large amount of data is stored in data warehouses. Thus comes to the end of characteristics of big data. 2 more Vs added to Big data: d) Veracity: It refers to inconsistencies and uncertainty in data, that is data which is available can sometimes get messy and quality and accuracy are difficult to control. Big Data is also variable because of the multitude of data dimensions resulting from multiple disparate data types and sources. Example: Data in bulk could create confusion whereas less amount of data could convey half or Incomplete Information. e) Value: The bulk of Data having no Value is of no good to the company, unless you turn it into something useful. Data in itself is of no use or importance but it needs to be converted into something valuable to extract Information. Benefits of Big Data • Cost Savings: Some tools of Big Data like Hadoop and Cloud-Based Analytics can bring cost advantages to business when large amounts of data are to be stored and these tools also help in identifying more efficient ways of doing business. • Time Reductions: The high speed of tools like Hadoop and in-memory analytics can easily identify new sources of data which helps businesses analyzing data immediately and make quick decisions based on the learning. • Understand the market conditions: By analyzing big data you can get a better understanding of current market conditions. For example, by analyzing customers’ purchasing behaviors, a company can find out the products that are sold the most and produce products according to this trend. By this, it can get ahead of its competitors. • Control online reputation: Big data tools can do sentiment analysis. Therefore, you can get feedback about who is saying what about your company. If you want to monitor and improve the online presence of your business, then, big data tools can help in all this. Benefits of Big Data • Using Big Data Analytics to Boost Customer Acquisition and Retention: The customer is the most important asset any business depends on. If a business is slow to learn what customers are looking for, then it is very easy to begin offering poor quality products. In the end, loss of clientele will result, and this creates an adverse overall effect on business success. The use of big data allows businesses to observe various customer related patterns and trends. Observing customer behavior is important to trigger loyalty. • Using Big Data Analytics to Solve Advertisers Problem and Offer Marketing Insights: Big data analytics can help change all business operations. This includes the ability to match customer expectation, changing company’s product line and of course ensuring that the marketing campaigns are powerful. • Big Data Analytics As a Driver of Innovations and Product Development Another huge advantage of big data is the ability to help companies innovate and redevelop their products. Benefits of Big Data Challenges • Need For Synchronization Across Disparate Data Sources: data sets are becoming bigger and more diverse; if overlooked leads to gaps. • Acute Shortage Of Professionals Who Understand Big Data Analysis: shortage of professionals who understand Big Data analysis • Getting Meaningful Insights Through The Use Of Big Data Analytics: to gain important insights from Big Data analytics, and also only the relevant department has access to this information. • Getting Voluminous Data Into The Big Data Platform: need to handle a large amount of data on daily basis. • Uncertainty Of Data Management Landscape: to find out which technology will be best suited to them without the introduction of new problems and potential risks. • Data Storage And Quality: storage of the massive amount of data is becoming a real challenge; to combine unstructured and inconsistent data from diverse sources, encounters errors. Missing data, inconsistent data, logic conflicts, and duplicates data all result in data quality challenges. • Security And Privacy Of Data: high risk of exposure of the data over disparate sources, making it vulnerable Big Data Analytics • Big data analytics is the process of collecting, examining, and analyzing large amounts of data to discover market trends, insights, and patterns that can help companies make better business decisions. • This information is available quickly and efficiently so that companies can be agile in crafting plans to maintain their competitive advantage. • Technologies such as business intelligence (BI) tools and systems help organizations take the unstructured and structured data from multiple sources. • Users (typically employees) input queries into these tools to understand business operations and performance. Big Data Analytics • Big data analytics is important because it helps companies leverage their data to identify opportunities for improvement and optimization. • Across different business segments, increasing efficiency leads to overall more intelligent operations, higher profits, and satisfied customers. • Big data analytics helps companies reduce costs and develop better, customer-centric products and services. • Data analytics helps provide insights that improve the way our society functions. In health care, big data analytics not only keeps track of and analyzes individual records, but plays a critical role in measuring COVID-19 outcomes on a global scale. It informs health ministries within each nation’s government on how to proceed with vaccinations and devises solutions for mitigating pandemic outbreaks in the future. Big Data Analytics Why? To make the right decisions for your business to succeed, you need the right data. So, it’s important to have a data analytics strategy in place. Such plans can help organizations: • boost revenue • cut costs • improve efficiencies • enhance marketing efforts • strengthen customer focus and customer service • respond quickly and effectively to market events and industry trends • reduce risk • gain a competitive edge Harnessing Big Data • OLTP(Online Transaction Processing)- DBMS • OLAP(Online Analytical Processing)Datawarehouse • RTAP(Real Time Analytical Processing)Big Data Architecture & Technology Traditional Model Traditional Data Model Traditional Data Warehouses are divided into a three-tier structure as follows: • The bottom tier contains the Data Warehouse server, with data pulled from many different sources integrated into a single repository. • The middle tier contains OLAP servers, which make data more accessible for the types of queries that will be used on it. • The top tier houses the front-end BI tools used for querying, reporting, and analytics. • Traditionally, ETL has been used with batch processing (data on the rest) in data warehouse environments Traditional Data Model • To integrate data across mixed application environments, you need to get data from one data environment (source) to another data environment (destination). Extract, Transform and Load (ETL) technologies have been used to accomplish this in traditional data warehouse environments. • ETL tools combine three important functions required to get data from one data environment and put it into another data environment. • Extract: Read data from the source database. • Transform: Convert the format of the extracted data so that it conforms to the requirements of the target database. (Transformation is done by using rules or merging data with other data.) • Load: Write data to the target database • Data warehouses provide business users with a way to consolidate information across disparate sources to analyze and report on data relevant to their specific business focus. ETL tools are used to transform the data into the format required by the data warehouse. The transformation is actually done in an intermediate location before the data is loaded into the data warehouse. • Many software vendor including Oracle, Microsoft, IBM, Informatica, Talend, and Pentaho provided Traditional ETL software tools. Big Data Model Big Data Model • Data Storage: There is data stored in file stores that are distributed in nature and that can hold a variety of format-based big files. It is also possible to store large numbers of different format-based big files in the data lake. This consists of the data that is managed for batch built operations and is saved in the file stores. • Batch Processing: Each chunk of data is split into different categories using long-running jobs, which filter and aggregate and also prepare data for analysis. These jobs typically require sources, process them, and deliver the processed files to new files. Multiple approaches to batch processing are employed, including Hive jobs, U-SQL jobs, Sqoop or Pig and custom map reducer jobs written in any one of the Java or Scala or other languages such as Python. • Real Time-Based Message Ingestion: A real-time streaming system that caters to the data being generated in a sequential and uniform fashion is a batch processing system. When compared to batch processing, this includes all real-time streaming systems that cater to the data being generated at the time it is received. This data mart or store, which receives all incoming messages and discards them into a folder for data processing, is usually the only one that needs to be contacted. Message-based ingestion stores such as Apache Kafka, Apache Flume, Event hubs from Azure, and others, on the other hand, must be used if message-based processing is required. The delivery process, along with other message queuing semantics, is generally more reliable. Big Data Model • Stream Processing: Real-time message ingest and stream processing are different. The latter uses the ingested data as a publish-subscribe tool, whereas the former takes into account all of the ingested data in the first place and then utilizes it as a publish-subscribe tool. Stream processing, on the other hand, handles all of that streaming data in the form of windows or streams and writes it to the sink. This includes Apache Spark, Flink, Storm, etc. • Analytics-Based Datastore: In order to analyze and process already processed data, analytical tools use the data store that is based on HBase or any other NoSQL data warehouse technology. The data can be presented with the help of a hive database, which can provide metadata abstraction, or interactive use of a hive database, which can provide metadata abstraction in the data store. NoSQL databases like HBase or Spark SQL are also available. • Reporting and Analysis: The generated insights, on the other hand, must be processed and that is effectively accomplished by the reporting and analysis tools that utilize embedded technology and a solution to produce useful graphs, analysis, and insights that are beneficial to the businesses. For example, Cognos, Hyperion, and others. • Orchestration: Data-based solutions that utilise big data are data-related tasks that are repetitive in nature, and which are also contained in workflow chains that can transform the source data and also move data across sources as well as sinks and loads in stores. Sqoop, oozie, data factory, and others are just a few examples Big Data Process Big Data Layers Big Data Layers • Big data sources layer: The data available for analysis will vary in origin and format; the format may be structured, unstructured, or semi-structured, the speed of data arrival and delivery will vary according to the source, the data collection mode may be direct or through data providers, in batch mode or in real-time, and the location of the data source may be external or within the organization. • Data Storage layer: This layer acquires data from the data sources, converts it, and stores it in a format that is compatible with data analytics tools. Governance policies and compliance regulations primarily decide the suitable storage format for different types of data. • Data Query Layer: It is the layer of data architecture where active analytic processing takes place. This is a field where interactive queries are necessary, and traditionally dominated by SQL expert developers. Before Hadoop, we had insufficient storage, due to which it takes a long analytics process. At first, it goes through a Lengthy process, i.e., ETL, to get a new data source ready to be stored, and after that, it puts the data in a database or data warehouse. Data ingestion and data analytics became two essential steps that solved problems while computing such a large amount of data while making a Data ingestion framework. Big Data Layers • Processing Layer: In the previous layer, we gathered the data from different sources and made it available to go through rest of pipeline. In this layer, our data is ready we only have to route the data to different destinations. In this layer the focus is to specialize Data Pipeline processing system. • Analysis layer: It extracts the data from the data storage layer (or directly from the data source) to derive insights from the data. • Visualization layer: This layer receives the output provided by the analysis layer and presents them to the relevant output layer. The consumers of the output may be business processes, humans, visualization applications, or services. * * Subjected to change. Reporting Vs Analysis • Reports and analytics help businesses improve operational efficiency and productivity, but in different ways. • Reports explains what is happening while Analytics helps identify why it is happening. • Reporting summarizes and organizes data in easily digestible ways while analytics enables questioning and exploring that data further. It provides invaluable insights into trends and helps create strategies to help improve operations, customer satisfaction, growth, and other business metrics. • Analytics enables business users to cull out insights from data, spot trends, and help make better decisions. Next-generation analytics takes advantage of emerging technologies like AI, NLP, and machine learning to offer predictive insights based on historical and real-time data. Reporting Vs Analysis Reporting Examples • Take the population census, for example. This is a technical document that transmits basic information on how many and what kind of people live in a certain country. It can be displayed in the text, or in a visual format, such as a graph or chart. But it is static information that can be used to assess current conditions. • A company’s data reporting often summarizes financial information such as revenues, accounts receivables, and net profits. This provides a timely record of the financial health of the company, or a segment of your finances, such as sales. A sales director might report on KPIs according to location, stage of the funnel, and close rate, to provide an accurate picture of the total sales pipeline. Analysis For data analytics, the steps involved include: • Creating a data hypothesis • Gathering and transforming data • Building analytical models to ingest data, process it and offer insights • Use tools for data visualization, trend analysis, deep dives, etc. • Using data and insights for making decisions. • Examples: • Marketing teams gather data on customer behavior and habits to form business strategies around them. A company like Starbucks keeps track of its customer base through its mobile app. The mobile app provides insight into consumer spending and buying behaviors, and the data is used in predictive analysis to orient future decisions. • Another aspect that companies improve by using data analytics is customer experience. CX is the engagement and interaction of customers with businesses. For example, McDonald’s stores customer data through their mobile app. These analytical efforts help them automatically send out promotions, discounts, and other updates. Different Types of Data Analytics  Descriptive(business intelligence and data mining): This surface-level analysis is aimed at analyzing past data through data aggregation and data mining. • Descriptive analytics looks at data and analyze past event for insight as to how to approach future events. It looks at the past performance and understands the performance by mining historical data to understand the cause of success or failure in the past. • Almost all management reporting such as sales, marketing, operations, and finance uses this type of analysis. • The descriptive model quantifies relationships in data in a way that is often used to classify customers or prospects into groups. • Unlike a predictive model that focuses on predicting the behavior of a single customer, Descriptive analytics identifies many different relationships between customer and product. • Common examples of Descriptive analytics are company reports that provide historic reviews like: • Data Queries • Reports • Descriptive Statistics • Data dashboard Different Types of Data Analytics  Diagnostic: This kind of analysis explores the “why”. For instance, diagnostic analysis can help in understanding the reason behind a sudden drop in customers for a company. • In this analysis, we generally use historical data over other data to answer any question or for the solution of any problem. We try to find any dependency and pattern in the historical data of the particular problem. • For example, companies go for this analysis because it gives a great insight into a problem, and they also keep detailed information about their disposal otherwise data collection may turn out individual for every problem and it will be very time-consuming. • Common techniques used for Diagnostic Analytics are: • Data discovery • Data mining • Correlations Different Types of Data Analytics  Predictive(forecasting): This, as the name suggests, helps in predicting the future course. This is done through actionable, data-driven insights which businesses can use to plan their future. Predictive analytics holds a variety of statistical techniques from modeling, machine, learning, data mining, and game theory that analyze current and historical facts to make predictions about a future event. Techniques that are used for predictive analytics are: • Linear Regression • Time series analysis and forecasting • Data Mining • There are three basic cornerstones of predictive analytics: • Predictive modeling • Decision Analysis and optimization • Transaction profiling Different Types of Data Analytics  Prescriptive(optimization and simulation): This kind of analytics helps in understanding how predicted outcomes can be used. This is a complex type of analytics involving algorithms, machine learning, and computational modelling procedures. Prescriptive Analytics automatically synthesize big data, mathematical science, business rule, and machine learning to make a prediction and then suggests a decision option to take advantage of the prediction. • Prescriptive analytics goes beyond predicting future outcomes by also suggesting action benefit from the predictions and showing the decision maker the implication of each decision option. • Prescriptive Analytics not only anticipates what will happen and when to happen but also why it will happen. Further, Prescriptive Analytics can suggest decision options on how to take advantage of a future opportunity or mitigate a future risk and illustrate the implication of each decision option. • For example, Prescriptive Analytics can benefit healthcare strategic planning by using analytics to leverage operational and usage data combined with data of external factors such as economic data, population demography, etc. Different Types of Data Analytics  Cognitive analytics: It is analytics with human-like intelligence. This can include understanding the context and meaning of a sentence, or recognizing certain objects in an image given large amounts of information. Cognitive analytics often uses artificial intelligence algorithms and machine learning, allowing a cognitive application to improve over time. Cognitive analytics reveals certain patterns and connections that simple analytics cannot. Key Roles Of Successful Analytics Projects Each key plays a crucial role in developing a successful analytics project: • Business User • Project Sponsor • Project Manager • Business Intelligence Analyst • Database Administrator (DBA) • Data Engineer • Data Scientist Key Roles Of Successful Analytics Projects Business User : • The business user is the one who understands the main area of the project and is also basically benefited from the results. • This user gives advice and consult the team working on the project about the value of the results obtained and how the operations on the outputs are done. • The business manager, line manager, or deep subject matter expert in the project mains fulfills this role. Project Sponsor : • The Project Sponsor is the one who is responsible to initiate the project. Project Sponsor provides the actual requirements for the project and presents the basic business issue. • He generally provides the funds and measures the degree of value from the final output of the team working on the project. • This person introduce the prime concern and brooms the desired output. Key Roles Of Successful Analytics Projects Project Manager : • This person ensures that key milestone and purpose of the project is met on time and of the expected quality. Business Intelligence Analyst : • Business Intelligence Analyst provides business domain perfection based on a detailed and deep understanding of the data, key performance indicators (KPIs), key matrix, and business intelligence from a reporting point of view. • This person generally creates fascia and reports and knows about the data feeds and sources. Key Roles Of Successful Analytics Projects Database Administrator (DBA) : • DBA facilitates and arrange the database environment to support the analytics need of the team working on a project. • His responsibilities may include providing permission to key databases or tables and making sure that the appropriate security stages are in their correct places related to the data repositories or not. Data Engineer : • Data engineer grasps deep technical skills to assist with tuning SQL queries for data management and data extraction and provides support for data intake into the analytic sandbox. • The data engineer works jointly with the data scientist to help build data in correct ways for analysis. Key Roles Of Successful Analytics Projects Data Scientist : • Data scientist facilitates with the subject matter expertise for analytical techniques, data modelling, and applying correct analytical techniques for a given business issues. • He ensures overall analytical objectives are met. • Data scientists outline and apply analytical methods and proceed towards the data available for the concerned project. Data Analytics Lifecycle Data Analytics Lifecycle Phase 1: Discovery • The data science team is trained and researches the issue. • Create context and gain understanding. • Learn about the data sources that are needed and accessible to the project. • The team comes up with an initial hypothesis, which can be later confirmed with evidence. Phase 2: Data Preparation • Methods to investigate the possibilities of pre-processing, analysing, and preparing data before analysis and modelling. • It is required to have an analytic sandbox. The team performs, loads, and transforms to bring information to the data sandbox. • Data preparation tasks can be repeated and not in a predetermined sequence. • Some of the tools used commonly for this process include - Hadoop, Alpine Miner, Open Refine, etc Data Analytics Lifecycle Phase 3: Model Planning • The team studies data to discover the connections between variables. Later, it selects the most significant variables as well as the most effective models. • In this phase, the data science teams create data sets that can be used for training for testing, production, and training goals. • The team builds and implements models based on the work completed in the modelling planning phase. • Some of the tools used commonly for this stage are MATLAB and STASTICA. Phase 4: Model Building • The team creates datasets for training, testing as well as production use. • The team is also evaluating whether its current tools are sufficient to run the models or if they require an even more robust environment to run models. • Tools that are free or open-source or free tools Rand PL/R, Octave, WEKA. • Commercial tools - MATLAB, STASTICA. Data Analytics Lifecycle Phase 5: Communication Results • Following the execution of the model, team members will need to evaluate the outcomes of the model to establish criteria for the success or failure of the model. • The team is considering how best to present findings and outcomes to the various members of the team and other stakeholders while taking into consideration cautionary tales and assumptions. • The team should determine the most important findings, quantify their value to the business and create a narrative to present findings and summarize them to all stakeholders. Phase 6: Operationalize • The team distributes the benefits of the project to a wider audience. It sets up a pilot project that will deploy the work in a controlled manner prior to expanding the project to the entire enterprise of users. • This technique allows the team to gain insight into the performance and constraints related to the model within a production setting at a small scale and then make necessary adjustments before full deployment. • The team produces the last reports, presentations, and codes. • Open source or free tools such as WEKA, SQL and Octave.

Data Analytics: Introduction to Concepts and Data Types

Related documents

Products

Support

Data Analytics: Introduction to Concepts and Data Types

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib