Intelligent data and knowledge processing Study program prepared by prof. Larysa Globa 1. Subjects description The subject "Intelligent data and knowledge processing" is the subject of special training for masters and PhD students and focuses on the main issues of methods and algorithms for solving intellectual tasks in a global environment, using weakly coupled and weakly structured resources. This training provides practical skills in software using for intelligent data, knowledge and services processing in a global environment. The material gathered to highlight the concept and to demonstrate the most important features of information processing using models of knowledge representation: logical, productive (ruleoriented), semantic, frames, models based on fuzzy logic and neural networks. The materials also consider methods and tools of information and computation resources analysis using OLAP-systems and allow obtaining by students the skills to use modern OLAP-systems and neural networks software. 2.1. 2. Purpose and objectives The purpose of the study course is: to provide for students the knowledge on the present mathematical and algorithmic tools for huge amounts of data processing in a global network that weakly structured and are not connected; to provide for students the skills to use modern information technologies such as data storage, systems of analytical data processing (such as OLAP, ROLAP, magazines, data marts and cabinets) and other analytical processing tools that are the part of up-today database systems, other data and knowledge mining tools in the global environment and applying this skills to practical realization of the future projects. 2.2. The main tasks: obtaining the knowledge in fields of abstract objects and processes modelling methods in a variety of subject areas, designing workflows of the modern web-based software application that provides differentiated services in a distributed environment; education of teaching students to apply the obtained knowledge for design service-oriented software applications and deep understanding how they are processing; forming ability of the students to obtain knowledge by themself for designing, implementation and support service-oriented software applications, data and knowledge mining tools and to apply this knowledge in practice. According to the program requirements, students will have to demonstrate the following results of training: to know: the algorithms and analytical data and knowledge processing mechanisms, how to use them; the basic models of knowledge representation: logical, productive (rule-oriented), semantic, frames, models based on fuzzy logic, models based on the graph and metagraph theory, models based on neural networks; the theoretical basics of creating and using data warehouses in a distributed global environment. to be able: to develop and to operate software that realizes a distributed data and knowledge processing based on advanced Intranet-based technologies and intelligent algorithms; to work with the tools of analytical data processing, which is a part of corporate databases, data warehouses and distributed databases in Intranet-environment; to develop and to work with data warehouse and analytical tools of modern database; to have experience in: using of analytical data and knowledge processing algorithms and mechanisms. 3. Subjects structure The names of chapters and topics 1 Chapter 1. The main research directions in artificial intelligence. Discrete mathematics and mathematical logic particulars. Topic 1.1. 1. Items of discrete mathematics 2. Introduction to mathematical logic Topic 1.2. 1. Introduction to predicate logic and logical inference 2. Expert systems 3. Examples of solving production process tasks Chapter 2. Introduction to the theory of fuzzy sets and fuzzy logic Topic 2.1. 1. Fuzzy sets. Key terms and definitions 2. Fuzzy arithmetic 3. Fuzzy relations and their properties 4. Operations with fuzzy relations 5. Fuzzy logic. Linguistic variables. Fuzzy truth. Fuzzy logic operations Topic 2.2. 1. The fuzzy knowledge base 2. The of fuzzy logic inference Chapter 3. Neural networks Topic 3.1. 1. Neural networks 2. The fuzzy inference adaptive network Total Topics Hours including Practices and Computer theoretical workshop workshops 4 5 Homework 2 3 6 8 4 2 - 2 4 2 1 - 1 4 2 1 - 1 8 4 2 - 2 4 2 1 - 1 4 2 1 - 1 5 2 1 - 2 5 2 1 - 2 1 3. Linguistic rules in decision making Chapter 4. Genetic algorithms and examples of their application Topic 4.1. 1. Genetic algorithms Topic 4.2. 1. Decision of the schedule designing tasks for digital flow processing by controller of the wideband radio network systems Chapter 5. The data processing based on tree-like fuzzy knowledgebase with combined inference scheme Topic 5.1. 1. Characteristics of data processing in complex organizational systems 2. Data and knowledge processing in complex administrative systems based on tree-like fuzzy knowledgebase 3. Construction and configure the bottom of the tree-fuzzy knowledge base 4. Approach of flexible software designing for data and knowledge processing Chapter 6. Metagraphs and their applications Topic 6.1. 1. Graphs, Hypergraphs and Metagraphs 2. Metagraph theory Topic 6.2. 1. Applications of metagraphs Presentation fuzzy knowledge base in the form of metagraph and operation with it Chapter 7. Semantic Web and computing and information resources meta descriptions Topic 7.1. 1. Semantic Web as a new model of internet information space. Metadata. Ontologies 2. Languages of queries to RDF repositories 3. Logical inference 4. Provision integrity and consistency 5. Agents and Services 2 3 4 5 6 10 4 2 - 4 5 2 1 - 2 5 2 1 - 2 5 2 1 - 2 5 2 1 - 2 10 4 2 - 4 5 2 1 - 2 5 2 1 - 2 10 4 2 - 4 5 2 1 - 2 6. Semantic realization 1 Web 2 3 4 5 6 5 2 1 - 2 10 4 2 - 4 5 2 1 - 2 5 2 1 - 2 36 4 2 26 4 18 2 1 13 2 18 2 1 13 2 102 6 108 32 32 16 2 18 26 26 28 4 32 practical Topic 7.2. 1. Web-services metadescription and UDDI specification 2. Advanced approach to WEBservices composition 3. Advanced approach to WEBservice discovery and selection 4. Increasing web-services discovery relevancy in the multiontological environment 5. Modelling results 6. The tree structure of computing (Web-services) and information resources based on metadescriptions designing Chapter 8. Methods of knowledge based portals development Topic 8.1. 1. Approaches analysis to information systematization for knowledge portal in fields of science and engineering 2. Formal algebraic system of knowledge representation for knowledge portal in fields of science and engineering Topic 8.2. 1. Method of tree for complex functional elements (sequence of computations) forming 2. Tools of information and computational resources systematization and structuring for knowledge portal in fields of engineering Chapter 9. Data Warehouses. Data cubes. OLAP-systems. Data marts. Data streams. Data mining Topic 9.1. 1. Data Warehouses 2. Data cubes Topic 9.2. 1. OLAP-systems 2. Data marts. The data streams 3. Data mining Overall Test Overall (hours) 4. Lectures № з/п Topics and a list of the main issues Chapter 1. The main research directions in artificial intelligence. Discrete mathematics and mathematical logic particulars The main issues: 1 1. The purpose and objectives 2. Items of discrete mathematics 2.1. Sets. Algebra of sets 2.2. The theory of Boolean functions. Boolean algebra 2.3. Definition and method of the Boolean functions specifying 2.4. Disjunctive and conjunctive normal form (CNF, DNF) 2.5. Dealing with disjunctive normal forms 2.6. Method of Quine - Mac Kloska for minimal DNF finding 3. Introduction to mathematical logic 3.1. The formal models 3.2. The propositional logic 3.3. The statement and proof of theorems 3.4. Check demonstrative reasoning 3.5. Syllogisms 3.6. The logical consequence 3.7. The main theorem of inference 3.8. Reduction to normal form 3.9. Method of resolution 3.10. The other methods 3.11. Adequacy of propositional logic 4. Overview to predicate logic and logical inference 4.1. Predicates 4.2. Free and coupled variables 4.3. Interpretation 4.4. Equivalence of predicate logic 4.5. The logical inference in predicate logic 4.6. The unification algorithm 4.7. Logic programming 5. Logical inference in Prolog 6. Logical inference application for circuit analysis 7. Expert Systems 8. Examples of solving production process tasks 8.1. Principles of linguistic modelling 8.2. The general structure of the expert rules 9. Conclusions 10. References The didactic tools: Slides on the subject and online tutorials The tasks for out-of-class activities: to repeat and to execute in-depth analysis of the lecture materials using the recommended literature. Chapter 2. Introduction to the theory of fuzzy sets and fuzzy logic The main issues: 1. Fuzzy sets. Basic terms and definitions 1.1. Properties of fuzzy sets 1.2. Operations on fuzzy sets 2. Fuzzy arithmetic 3. Fuzzy relations and their properties 4. Operations on fuzzy relations 5. Fuzzy logic 5.1. Linguistic variables 5.2. Fuzzy truth 5.3. Fuzzy logic operations 6. The fuzzy knowledge base 7. The fuzzy logical inference 7.1. Compositional rule of fuzzy inference Zade 7.2. Mamdani fuzzy logical inference 7.3. Sugeno fuzzy logical inference 7.4. Singleton fuzzy logical inference model 7.5. Fuzzy logical inference for classification tasks 7.6. Hierarchical fuzzy logical inference system 8. Conclusions 9. References The didactic tools: Slides on the subject and online tutorials The tasks for out-of-class activities: to repeat and to execute in-depth analysis of the lecture materials using the recommended literature. Chapter 3. Neural networks 2 The main issues: 1. Neural networks 1.1. The basic concepts 1.2. Simulating of nerve cells 1.3. The mathematical model of a neuron 1.4. Training of neural networks 1.5. Method of the error back propagation 2. The adaptive network of fuzzy inference 3. Linguistic rules in decision making 3.1. Automatic control 3.2. Situational control 3.3. Medical diagnostics 3.4. Multi-criteria evaluation 3.5. Multivariate analysis 4. Conclusions 10. References The didactic tools: Slides on the subject and online tutorials The tasks for out-of-class activities: to repeat and to execute in-depth analysis of the lecture materials using the recommended literature. Chapter 4. Genetic algorithms and examples of their application 3. 4. The main issues: 1. Genetic algorithms 1.1. Introduction 1.2. Genetic operators 1.3. Data presentation in the genes 1.4. Strategies for the selection and formation of a new generation 1.5. Patterns and patterns theorem 1.6. Models of genetic algorithms 1.7. Testing functions 2. Decision of the schedule designing tasks for digital flow processing by controller of the wideband radio network systems 2.1. Evaluation of the executive effectiveness of the operations sequence for digital stream formation 2.2. Data parallel processing for the digital stream frame formation 2.3. Approaches to the task of a sequence of operations designing for the digital stream formation 2.4. Methods for analysing and optimizing execution sequence of operations for frame of the digital flow formation by controller of the radio network 2.4.1. The formal model of the process for the digital stream frame formation 2.4.2. Method of improving the efficiency of operations for the digital stream formation 2.4.3. Analysis of requirements to the process for the digital flow frame forming by radio network controller 2.4.4. Time of hardware resources usage optimization when digital stream formation 2.5. The method of scheduling execution of operations for the digital stream frame formation by radio network controller 2.5.1. Solving the problem of execution of operations scheduling in the software block of the digital stream frame formation by radio network controller 2.5.2. Designing the computation process mathematical models for the digital flow frame forming 2.5.3. Analysis of the computational process for the digital flow frame forming 2.5.4. Algorithms for execution of operations schedule designing for the digital flow frame forming 2.5.5. On-line execution of the atomic operations scheduling optimization 2.6. The experimental results on flowcharting and diagrams analysis of execution of operations formation sequence for digital flow by the radio network controller on the example of LTE frame forming 2.7. Implementation of the method of increasing the digital flow forming efficiency by radio network controller 3. Conclusions 4. References The didactic tools: Slides on the subject and online tutorials The tasks for out-of-class activities: to repeat and to execute in-depth analysis of the lecture materials using the recommended literature. Chapter 5. The data processing based on tree-like fuzzy knowledgebase with combined inference scheme 5. The main issues: 1. Description of information processing in complex organizational systems 1.1. The management issue of complex administrative systems 1.2. The structure of information flow in complex administrative systems 1.3. Systematization and analysis methods of information processing in complex administrative systems 1.4. Data processing based on fuzzy inference subsystem 1.4.1. The fuzzy inference subsystem of the general form 1.4.2. Problems of disordered knowledge base using 1.5. Methods and ways of the membership function definition 1.6. Analysis of approaches to software modules of information processing designing 2. An approach to information processing in complex administrative systems 2.1. Reduction of fuzzy knowledge base to the tree structure 2.2. Fuzzy inference subsystem modification 2.2.1. Modification of the fuzzy inference scheme 2.2.2. The structure of fuzzy inference subsystem modification 2.3. Presentation of the fuzzy knowledge base top as the classical logical knowledge base 2.4. Method of data processing based on tree-fuzzy knowledge bases with a mixed inference scheme 3. Construction and configuring the bottom of the tree-fuzzy knowledge base 3.1. Approach to development and configuring the bottom of the knowledge base 3.2. The bottom of the knowledge base presentation as a fuzzy neural network 3.3. The structure of the fuzzy knowledge base bottom formation 3.3.1. Simplifying the fuzzy knowledge base by similarity 3.3.2. Construction the initial fuzzy knowledge base of the lower level 3.3.3. Formation of the fuzzy knowledge base bottom using genetic programming methods 3.4. Configuring the membership functions terms of the lower level linguistic variables 4. Development flexible software components for information processing 4.1. The main issues for development of 4.1.1. Tools of flexible software components 4.1.2. Binding to the subject area 4.1.3. Automatic construction of a database query 4.2. The experience of the practical usage of the flexible software components of information processing 4.2.1. Formalization of the subject area “State traffic police of Ukraine” 4.2.2. Binding software module of data processing to the subject area 4.2.3. Analysis of accident and crime on the roads of Ukraine 5. Conclusions 6. References The didactic tools: Slides on the subject and online tutorials The tasks for out-of-class activities: to repeat and to execute in-depth analysis of the lecture materials using the recommended literature. Chapter 6. Metagraphs and their applications The main issues: 6. 1. Graphs, Hypergraphs, and Metagraphs 1.1. Graphs and Data Visualization 1.2. Graph Structures 2. Metagraph Theory. 2.1. The Algebraic Structure of Metagraphs 2.1.1. Formal Representation of a Metagraph 2.1.2. The Incidence and Adjacency Matrices 2.1.3. Identifying Metapaths 2.2. Connectivity Properties of Metagraphs 2.2.1. Dominant Metapaths 2.2.2. Cutsets and Bridges 2.3. Metagraph Transformations 2.3.1. Hierarchical Abstraction Using Projection 2.3.2. The Inverse Metagraph 2.3.3. The Element Flow Metagraph 2.4. Attributed Metagraphs 2.4.1. Qualitative Attributes 2.4.2. Quantitative Attributes 2.4.3. Conditional Metagraphs. Projections in Conditional Metagraphs. Connectivity and Redundancy 2.5. Independent Sub-Metagraphs 3. Applications of Metagraphs 3.1. Presentation fuzzy knowledge base in the form of Metagraphs 3.2. Visualization and analysis of fuzzy knowledge base correctness using Metagraphs 3.3. Inference methods based on Metagraphs 4. Conclusions 5. References The didactic tools: Slides on the subject and online tutorials The tasks for out-of-class activities: to repeat and to execute in-depth analysis of the lecture materials using the recommended literature. Chapter 7. Semantic Web and computing and information resources metadescriptions The main issues: 1. Semantic Web as a new model of internet information space 1.1. Motivation of semantic approach development to describe computational (Web-services) and information resources 1.2. Semantic Web concept 1.2.1. Semantic Web architecture 1.2.2. URI - Uniform Resource Identifier 1.2.3. Documents: Extensible Markup Language (XML) 1.2.4. Statements: The general scheme of resource description RDF, schema RDF 7. 1.2.5. Metadata 1.3. Ontologies 1.4.Query languages to RDF repositories 1.5.Logical inference 1.6.Ensuring the integrity and consistency 1.7.Agents and Services 1.8.Practical implementation of Semantic Web 2. Web-services meta-description. UDDI specification 3. Advanced approach to Web-services composition 4. Advanced approach to Web-services discovery and selection 5. Increasing web services discovery relevancy in the multi-ontological environment 5.1. Analysis of the existing approaches 5.2. Web service similarity evaluation method 5.3. Results based on modelling 6. Formation of the tree structure for computational (Web-services) and information resources based on meta descriptions 7. Conclusions 8. References The didactic tools: Slides on the subject and online tutorials The tasks for out-of-class activities: to repeat and to execute in-depth analysis of the lecture materials using the recommended literature. Chapter 8. Methods of knowledge based portals development The main issues: 1. Analysis of approach for data and knowledge systematization on portal of knowledge 1.1. The issue of systematization and structuring information for complex information systems in fields of science and engineering 1.2. Description of features, the issue of development, approaches and models 1.3. Formal algebraic systems for knowledge portal, basic definitions and concepts 2. Formal algebraic system of scientific and engineering knowledge representation 2.1. Algebraic system model for knowledge portals in fields of science and engineering 2.2. Operations of calculations algebra 2.3. Mediums of the data 2.4. Simple operations properties of calculations algebra 2.5. Complex operations properties of calculations algebra 3. Method of common functional elements tree forming 3.1. The approach to the complex functional elements developing 3.2. Stages of the method of common functional elements developing 8. 3.3. The method of complex workflow dynamic developing in fields of science and engineering 4. Software tools for information and computing resources systematization and structuring on knowledge portal in fields of science and engineering 4.1. The basic concept for software tools development 4.1.1. Software tools complex for information and computing resources systematization and structuring on knowledge portal 4.1.2. The way of interoperability of the information and functional elements on the knowledge portal 4.1.3. Structuring and systematization of portal knowledge 4.1.4. Software tools for dynamic workflow development 4.2. Experience of software tools practical usage 4.2.1. Portal "Strength of materials" 4.2.2. Portal “ National Antarctic Scientific Centre of Ukraine” 5. Conclusions 6. References The didactic tools: Slides on the subject and online tutorials The tasks for out-of-class activities: to repeat and to execute in-depth analysis of the lecture materials using the recommended literature. Chapter 9. Data Warehouses. Data cubes. OLAP-systems. Data marts. Data streams. Data mining The main issues: 9. 1. Data Warehouses 1.1. The concept of data warehouse 1.2. Comparison of on-line transaction processing system and data warehouse Problems in the development and maintenance of the data warehouses 1.3. The architecture of the data warehouse 1.4. Information flows in a data warehouse 1.5. Tools and technologies of data warehousing 1.6. Data stores 2. Data cubes 2.1. Denormalized dimensional database 2.2. “Dimensional” methodology 2.3. Star and snowflake schemes 2.4. Data cubes, hierarchies, aggregates 2.5. Variants of data warehouses implementing 2.6. Store data formats in OLAP-cubes 3. OLAP-systems 3.1. OLAP tools benchmarking 3.2. OLAP application and advantages of OLAP 3.3. Presentation of multidimensional data 3.4. Codd rules for OLAP tools selecting 3.5. Technology of data processing 3.6. Methods of data processing 4. The data marts. The data stream 4.1. Data marts structure developing used SQL Server Management Studio 4.2. Setting dimensions and links 4.3. Filling the empty data marts windows used Integration Services 4.4. The Integration Service project creating 4.5. Creating first level tasks flow and data flow. Second level data flow. Third level data flow. 4.6. Data flows to the fact table 4.7. Advanced cube settings 4.8. The perspective creating 4.9. Design and usage of key performance indicators 4.10. Creating the relational schema from the multidimensional cube 5. Data processing 5.1. The purpose of analytical services usage 5.2. Data mining models 5.3. Data mining algorithm 5.4. Data mining model development 5.5. Analysis of “naive Bayes” model 6. Conclusions 7. References The didactic tools: Slides on the subject and online tutorials. The tasks for out-of-class activities: to repeat and to execute in-depth analysis of the lecture materials using the recommended literature. 4. Practical exercises and labs The purpose of the practical exercises and labs cycle is: the students have received basic practical skills of using data warehouses, analytical tools for data processing, such as ROLAP, MOLAP, learned how to develop a simple expert systems, have the skills to use the language for statistical data processing. № з/п 1 Title of the practical exercises and labs Data processing used Microsoft Visual Studio 2008 hours 6 2 3 4 5 6 Applying MS Excel as a remote client MS SQL Server 2008 and 6 MS Analysis Server 2008 via Internet Development of a semantic network 3 Development of frame model 2 Development an expert system based on productional model 3 Processing statistics data 6 7. Homework № з/п 1 2 3 Title of the topic for students’ homework Methods based on ant colony optimization Fuzzy neural network Ontologies and Semantic Web hours 18 18 18 8. Reference 1. V. Korneev, A.F. Gareyev, S.V. Vasjutin, V. Reich Databases. Intelligent processing of information. - Moscow, Publishing House "Nolidg" , 2000, 260 p. 2. T.N. Baydyk Neural networks and artificial intelligence tasks, Kiev, "Naukova Dumka", 2001, 263 p. 3. Bratko I. Prolog Programming for Artificial Intelligence: Per. from English. – Moscow, Mir, 1990 560 p. 4. Knowledge Base of Intelligent Systems \ T.A. Gavrilova, V.F. Khoroshevsky - St. Peter, 2000, 384 p. 5. Korotkii S. Neural network: basic concepts // http://www.neuropower.de/rus/books/index.html 6. Connolly, T., Begg, K., Strachan A. Databases: Design, implementation and maintenance. Theory and Practice, 2nd ed .: Trans. with Engl .: Uch.pos. - M .: Publishing House, "Williams", 2000.- 1120 p. 7. Rothstein A.P. Intelligent identification technology, fuzzy sets, genetic algorithms, neural networks. Vinnica: The UNIVERSUM, 1999. -320 p. 8. Subbotin S.A. Knowledge presentation and processing in the artificial intelligence and decision support systems: Handbook. - Zaporozhye, Zaporozhye National Technical University, 2008. - 341 p. 9. The slides to present lectures contain additional references after every topic.