PETE 2060, Computing and Data Mining PETE 4990, Data Mining (TPS) Introduction What is Data Analytics? Business analytics (BA) is the practice and art of bringing quantitative data to bear on decision‐making. The term means different things to different organizations. Process of extracting meaningful insights from data -> Enables data-driven decisions based on factual data Data Data Analytics Data Driven Decisions Artificial Intelligence vs Machine Learning vs Deep Learning o Artificial Intelligence: The science behind programming computers to simulate human intelligence by thinking and acting like humans. Artificial Intelligence • Automates repetitive learning and discovery through data • Analyzes more data, faster and more accurately. Symbolic Learning o Machine learning: A specific subset of AI applications that learn on their Statistical Learning own using patterns in the data without explicit programming. o Deep Learning: Uses complex neural network algorithms with many Machine (Data) Learning Robotics Computer Vision Image Processing Speech Recognition Natural Language Processing Deep Learning (ANN) Computer Vision Object Recognition processing layers to recognize patterns in very large datasets. 3 What is Data Mining? Definition: Non-trivial extraction of implicit, previously unknown and potentially useful information from data. Non-trivial: obvious knowledge is not useful Implicit: hidden and difficult to observe knowledge Previously unknown Potentially useful: actionable; easy to understand A pattern is interesting if: o Easily understood by humans, o Valid on new or test data with some degree of certainty o Potentially useful, novel o Validates some hypothesis that a user seeks to confirm Interestingness measures: Objective: based on statistics and structures of patterns, e.g., support, confidence, etc. Subjective: based on user’s belief in the data, e.g., unexpectedness, novelty, actionability, etc. 4 CRISP-DM Pipeline Cross Industry Standard Process for Data Mining 5 Knowledge Discovery • Learning the application domain o • Data selection • Data cleaning and preprocessing (may take 60% of the effort!) • Data reduction and transformation o Find useful features, dimensionality/variable reduction, invariant representation Choosing functions of data mining o Relevant prior knowledge and goals of application Creating a target data set o • Summarization, classification, regression, association, clustering • Choosing the mining algorithm(s) • Data mining o • Pattern evaluation and knowledge presentation o • Search for patterns of interest Visualization, transformation, removing redundant patterns, etc. Use of the discovered knowledge 6 What Types of Data? What Types of Analytics? Data Types Foresight High Prescriptive ▪ Time-series data, temporal data, sequence data ▪ Structured data, graphs, information networks Insight Value ▪ Data streams and sensor data Predictive Diagnostic ▪ Spatial data and spatiotemporal data Complexity ▪ Relational database, heterogeneous databases Medium ▪ Text databases Descriptive Hindsight Low 7 What Are We Trying to Do? ML Models Learning Tasks Supervised Unsupervised Reinforcement Makes inferences using labeled data Makes inferences without labeled data Learns based on consequences Classification Regression Clustering • Find patterns in a large quantity of data (e.g., quantify “low risk” borrowers for loans) • Match related data (e.g., facial recognition) • Make recommendations (create predictive models from large data sets and apply in in real-time to specific cases) • Decision-making (fully or semi-autonomous – self-driving cars, probation determinations) Data Situation in the Industry 9 Exploration and Field Development Resource Remote Constraints Time Constraints More Complex Verify assumptions using historical data Acreage assessment and prospect generation Increase the success rate of identifying potentially productive seismic trace signatures 10 Operational Efficiency • Pressure support • Reservoir sweep • Water/gas/steam flooding • Enhanced oil recovery (EOR) • Real-time drilling optimization • Precision drilling • Understand operational constraints Subsurface Characterization Drilling & Completions • Well Performance • Field Operations • Lifting and pumping • Flow assurance • Reduced uncertainty • Automation • Best practices and knowledge capture • Remote collaborative teams Production Engineering Operational Excellence 11 Predictive Maintenance Monitoring Legacy Data Real-time Data Digital Twin Diagnosis Adjusting Situational Real-time Awareness Decision Making Risk Sensing Proactive Capability Prevention Expert Knowledge 12 Course Objectives ▪ Students will have a comprehensive knowledge of the principles of data mining techniques. ▪ Students will be familiar with various soft computing algorithms such as neural network and support vector machines (PETE 4990) ▪ Students will have an appreciation of the necessity and benefits of applying modern data analytics and knowledge discovery techniques in petroleum engineering. ▪ Students will be able to manipulate different data mining techniques to build analytical applications within petroleum engineering. 13 Tentative Topics • • • • • • • • • • • • • Introduction to data pre-processing Data cleaning and preparation Data wrangling Modelling with machine learning algorithms Frequent patterns and association rules Linear regression Decision trees for Classification Regression trees Ensemble Methods-Classification Ensemble Methods-Regression Support Vector Machines Artificial Neural Networks Clustering Techniques 14 Data Mining Tools ▪ Orange Data Mining based on visual programming: Orange Data Mining - Getting started ▪ Python for data mining: Free Download | Anaconda 15