Faculty of SET / School of Computer Science COMP SCI 1400 AI Technologies AI Techniques Dr. Kamal Mammadov Review • Trend in AI • Agent • Narrow AI vs Generalised AI • Symbolic AI vs Connectionist AI University of Adelaide 2 Outline • Data Mining – Machine learning – Deep learning – Information retrieval • Classification and Clustering algorithms • Reasoning • Problem solving University of Adelaide 3 Data Mining University of Adelaide 4 What is Data Mining • Data mining is, in general terms, the extraction of information from data in various forms. • What can we do with data mining? ➢ Extract useful information from data sets. ➢ Discover meaningful correlations, patterns, and trends from data. University of Adelaide 5 Data Mining University of Adelaide 6 Data Mining • Data: raw un-interpreted facts – e.g. Tom, 20 years old, student • Information relates items of Data together – e.g. Tom is 20 years old • Knowledge relates items of Information together – Tom is 20 years old → Tom pays > $1500 car insurance • Modelling the world (= generalising) – [18 - 25] years old → P(accident) = high University of Adelaide 7 Fitting to the Business Case 1: insurance industry • Problem: AAMI insurance wanted a better and fairer way to work out premiums. • Approach: The Data Mining Research Group at the Faculty of IT, Monash University used more than 30,000 customer transactions in AAMI’s database to work out what characterized good and bad driver behavior. • Deployment: The company could use the discovered features to set fairer premiums. University of Adelaide 8 DM - Machine Learning • Supervised versus unsupervised learning. ➢ Supervised Learning • We have training data that is annotated with known target values (labelled). • The task is usually one of predicting the target values for new data: Classification and Regression. Test data Training data Label: cat cat dog dog ? ? Image: University of Adelaide 9 DM - Machine Learning • Supervised versus unsupervised learning. ➢ Supervised Learning • We have training data that is annotated with known target values (labelled). • The task is usually one of predicting the target values in new data: Classification and Regression. ➢ Classification: predict categorical (discrete) values. e.g., image classification ➢ Regression: predict numerical (continuous) values. e.g., stock price University of Adelaide 10 DM - Machine Learning • Unsupervised Learning – Our data is not annotated with any known target values (unlabeled). – Our task is usually learning the structure of the data. Training data Test data Image: University of Adelaide 11 DM - Machine Learning Typical Methods • Classifiers: – – – – – – – K-Nearest Neighbor Decision Tree Random Forest Naïve Bayes Support Vector Machine Neural Network …… • Clustering algorithms: – – – – – K-means DBSCAN Agglomerative Clustering Spectral Clustering …… University of Adelaide 12 DM - Deep Learning Artificial Neural Networks: • It is inspired by human brain which is composed of cells called neurons which are interconnected to form a massive network. • It is one of the earliest methods of AI, which attracted a lots of attention between the 1950's and 1980's due to their potential to automatically develop ways of solving problems, given appropriate training data. • Neural networks have gone through several phases of decline and resurgence. University of Adelaide 13 DM – Deep Learning • Neural network terms ➢ Neuron ➢ Connection ➢ Activation ➢ Layer • Neural network training ➢ Backpropagation ➢ … University of Adelaide 14 Activation functions The activation function must be some non-linear function. Examples of activation functions used include sigmoid, ReLu, leaky ReLu. The sigmoid function 𝜎 𝑥 used activation function. 1 = , is the most commonly 1+𝑒 −𝑥 The rate-of-change (derivative) of the activation function plays an essential role in backpropagation. University of Adelaide 15 Derivative of Sigmoid function 1 Derive the derivative of the sigmoid function 𝜎 𝑥 = . 1+𝑒 −𝑥 Hint: Recall the quotient rule for derivatives, for any ℎ 𝑥 = 𝑓(𝑥) , ℎ′ 𝑥 𝑔(𝑥) 𝑓 ′ 𝑥 𝑔 𝑥 −𝑔′ 𝑥 𝑓(𝑥) = 𝑔(𝑥)2 A. 𝜎 ′ 𝑥 = −𝜎 𝑥 B. 𝜎 ′ 𝑥 = 𝜎 −𝑥 1 C. 𝜎 ′ 𝑥 = 𝜎(𝑥) D. 𝜎 ′ 𝑥 = 𝜎(𝑥)(1 − 𝜎(𝑥)) University of Adelaide 16 DM - Deep Learning • • • • • • • • • Feed-Forward Neural Networks Convolutional Neural Networks Recurrent Networks Long/Short Term Memory Network Deep Belief Networks Auto-Encoders Generative Adversarial Network Variational Auto-Encoders … University of Adelaide 17 Universal Approximation Theorem for Arbitrary-Width Feedforward Networks The Universal Approximation Theorem states that a feedforward neural network with a single hidden layer, given enough neurons, can approximate any continuous function to any desired level of accuracy. This holds true provided the activation function used in the hidden layer is non-linear and bounded. Since the theorem guarantees that neural networks can approximate any continuous function, it means that neural networks can be applied to a vast range of problems across different domains. This includes tasks in image recognition, natural language processing, game playing, and more. University of Adelaide 18 Theorem (1) Consider a non-constant, bounded, and continuous activation function. Let 𝐾 be a compact subset of 𝑅𝑛 . For any continuous function 𝑓 that maps 𝐾 to 𝑅𝑚 and for any positive number ε, there exists a feedforward neural network with a single hidden layer, having a finite number of neurons in the hidden layer, weight matrices denoted by 𝑊 (of dimensions 𝑁 × 𝑛) and 𝑉 (of dimensions 𝑚 × 𝑁), and bias vector denoted by 𝑏 (of dimension 𝑁). The neural network defines an output function 𝑔 that maps 𝐾 to 𝑅𝑚 as follows: • The input 𝑥 is multiplied by the weight matrix 𝑊 and added to the bias vector 𝑏. • The activation function is then applied element-wise to the resulting vector. • The result is then multiplied by the weight matrix 𝑉. University of Adelaide 19 Theorem (2) 𝑔 𝑥 = 𝑉𝜎(𝑊𝑥 + 𝑏) This output function 𝑔 can approximate the continuous function 𝑓 to any desired degree of accuracy, specifically ensuring that the difference between 𝑓 and 𝑔 is less than 𝜀 for all inputs 𝑥 in the compact set 𝐾. For any given f x , ε, there exists a feedforward network 𝑔(𝑥), such that for all 𝑥 ∈ 𝐾, ||𝑓 𝑥 − 𝑔 𝑥 || < ε In summary, the Universal Approximation Theorem highlights the theoretical foundation of neural networks' power and flexibility, making them incredibly useful for approximating complex functions in a wide range of practical applications. University of Adelaide 20 DM – Pattern Recognition • Pattern recognition: – Borrow from statistics, mathematics and signal processing. – Develop mathematical models largely from statistical/probabilistic frameworks for modelling data. • Pattern recognition vs Machine learning – Pattern recognition is earlier – They have same ultimate tasks to mine patterns from data. – Pattern recognition fits the model of existing features, while machine learning learns features from data. University of Adelaide 21 DM – Information Retrieval • Information Retrieval (IR) is the automatic processes that respond to a user query by examining a collection of documents and returning a sorted document list that should be relevant to the user requirements as expressed in the query. Content: Intent: Documents, Images, Knowledge Key words, Question Key Questions: How to Represent Intent and Content, How to Match Intent and Content University of Adelaide Source from Hang Li, SIGIR’16 Tutorial 22 DM – Information Retrieval • Approach in traditional IR: Source from Hang Li, SIGIR’16 Tutorial University of Adelaide 23 DM – Information Retrieval • Representation and Matching are key problems in IR Source from Hang Li, SIGIR’16 Tutorial University of Adelaide 24 Reasoning University of Adelaide 25 Reasoning • Reasoning aimed at generating answers to unseen questions by manipulating existing knowledge with inference techniques. – Example: John is either in the car or in the house. He isn’t in the car, therefore he is in the house. • Two components: – Knowledge, such as a knowledge graph, common sense, rules, assertions extracted from raw texts, etc.; – An inference engine, to generate answers to questions by manipulating existing knowledge. University of Adelaide 26 Reasoning • Knowledge Graph – A knowledge graph acquires and integrates information into an ontology and applies reasoning to derive new knowledge. – Google knowledge graph. Freebase University of Adelaide 27 Reasoning • Knowledge Graph – Edges, Vertices and Relations (Da Vinci, painted, Mona Lisa) source University of Adelaide 28 Reasoning • Knowledge Graph – Research • Representation: logic, n-tuples, database • Knowledge acquisition and Construction: Named Entity Recognition, relation extraction. • Temporal knowledge graph University of Adelaide 29 Reasoning • Inference engine – Integer linear programming (ILP) – Probabilistic method: e.g., Bayesian Networks, Markov logic networks (MLNs) – Neural methods: Memory network and variants. University of Adelaide 30 Problem Solving University of Adelaide 31 Problem Solving • Problem solving by Search – searching in an internal representation for a path to a goal. – Differs with search the world or search on the Web. University of Adelaide 32 Problem Solving • Problem solving by Search University of Adelaide 33 Problem Solving • Problem solving by Search – Genetic Algorithm • Inspired by the process of natural selection • Belongs to evolutionary algorithms • Used to generate solution in search problems • Mutation, crossover, selection University of Adelaide 34 Fields that influence(d) AI Philosophy Mathematics Economics Neuroscience Psychology Logic, methods of reasoning, mind as physical system foundations of learning, language, rationality Formal representation and proof algorithms, computation, (un)decidability, (in)tractability, probability Utility, decision theory Physical substrate for mental activity Phenomena of perception and motor control, experimental techniques Computer engneering Control theory Building fast computers Design systems that maximize an objective function over time Linguistics Knowledge representation, grammar Summary • Data mining – Machine learning (supervised, unsupervised) – Classification algorithms (supervised learning) – 1. K-Nearest Neighbor – 2. Decision Tree – 3. Naïve Bayes – Clustering algorithms (unsupervised learning) – 1. K-means – 2. DBSCAN – 3. Agglomerative Clustering – Deep learning – Information retrieval (representation, similarity) • Reasoning (Graph, search) University of Adelaide 36 References • COM329, MQ by Dr. Jia Wu • SIGIR tutorial 2016 by Prof. Hangli University of Adelaide 37
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )