Intelligent data and knowledge processing

advertisement
Intelligent data and knowledge processing
Study program
prepared by prof. Larysa Globa
1. Subjects description
The subject "Intelligent data and knowledge processing" is the subject of special training for
masters and PhD students and focuses on the main issues of methods and algorithms for solving
intellectual tasks in a global environment, using weakly coupled and weakly structured resources.
This training provides practical skills in software using for intelligent data, knowledge and
services processing in a global environment.
The material gathered to highlight the concept and to demonstrate the most important features
of information processing using models of knowledge representation: logical, productive (ruleoriented), semantic, frames, models based on fuzzy logic and neural networks. The materials also
consider methods and tools of information and computation resources analysis using OLAP-systems
and allow obtaining by students the skills to use modern OLAP-systems and neural networks software.
2.1.
2. Purpose and objectives
The purpose of the study course is:
 to provide for students the knowledge on the present mathematical and algorithmic tools for
huge amounts of data processing in a global network that weakly structured and are not
connected;
 to provide for students the skills to use modern information technologies such as data storage,
systems of analytical data processing (such as OLAP, ROLAP, magazines, data marts and
cabinets) and other analytical processing tools that are the part of up-today database systems,
other data and knowledge mining tools in the global environment and applying this skills to
practical realization of the future projects.
2.2. The main tasks:
 obtaining the knowledge in fields of abstract objects and processes modelling methods in a
variety of subject areas, designing workflows of the modern web-based software application
that provides differentiated services in a distributed environment;
 education of teaching students to apply the obtained knowledge for design service-oriented
software applications and deep understanding how they are processing;
 forming ability of the students to obtain knowledge by themself for designing,
implementation and support service-oriented software applications, data and knowledge
mining tools and to apply this knowledge in practice.
According to the program requirements, students will have to demonstrate the following results of
training:
 to know:
 the algorithms and analytical data and knowledge processing mechanisms, how to use
them;
 the basic models of knowledge representation:
 logical,
 productive (rule-oriented),
 semantic,
 frames,
 models based on fuzzy logic,
 models based on the graph and metagraph theory,
 models based on neural networks;
 the theoretical basics of creating and using data warehouses in a distributed global
environment.
 to be able:
 to develop and to operate software that realizes a distributed data and knowledge
processing based on advanced Intranet-based technologies and intelligent algorithms;
 to work with the tools of analytical data processing, which is a part of corporate
databases, data warehouses and distributed databases in Intranet-environment;
 to develop and to work with data warehouse and analytical tools of modern database;
 to have experience in:
 using of analytical data and knowledge processing algorithms and mechanisms.
3. Subjects structure
The names of chapters and topics
1
Chapter 1. The main research
directions in artificial intelligence.
Discrete mathematics and
mathematical logic particulars.
Topic 1.1.
1. Items of discrete mathematics
2. Introduction to mathematical logic
Topic 1.2.
1. Introduction to predicate logic
and logical inference
2. Expert systems
3. Examples of solving production
process tasks
Chapter 2. Introduction to the
theory of fuzzy sets and fuzzy logic
Topic 2.1.
1. Fuzzy sets. Key terms and
definitions
2. Fuzzy arithmetic
3. Fuzzy relations and their
properties
4. Operations with fuzzy relations
5. Fuzzy logic. Linguistic variables.
Fuzzy truth. Fuzzy logic
operations
Topic 2.2.
1. The fuzzy knowledge base
2. The of fuzzy logic inference
Chapter 3. Neural networks
Topic 3.1.
1. Neural networks
2. The fuzzy inference adaptive
network
Total
Topics
Hours
including
Practices
and
Computer
theoretical
workshop
workshops
4
5
Homework
2
3
6
8
4
2
-
2
4
2
1
-
1
4
2
1
-
1
8
4
2
-
2
4
2
1
-
1
4
2
1
-
1
5
2
1
-
2
5
2
1
-
2
1
3. Linguistic rules in decision
making
Chapter 4. Genetic algorithms and
examples of their application
Topic 4.1.
1.
Genetic algorithms
Topic 4.2.
1.
Decision of the schedule
designing tasks for digital flow
processing by controller of the
wideband radio network systems
Chapter 5. The data processing
based on tree-like fuzzy
knowledgebase with combined
inference scheme
Topic 5.1.
1. Characteristics of data processing
in complex organizational systems
2. Data and knowledge processing in
complex administrative systems
based
on
tree-like
fuzzy
knowledgebase
3. Construction and configure the
bottom
of
the
tree-fuzzy
knowledge base
4. Approach of flexible software
designing for data and knowledge
processing
Chapter 6. Metagraphs and their
applications
Topic 6.1.
1. Graphs, Hypergraphs and
Metagraphs
2. Metagraph theory
Topic 6.2.
1. Applications of metagraphs
Presentation fuzzy knowledge base
in the form of metagraph and
operation with it
Chapter 7. Semantic Web and
computing and information
resources meta descriptions
Topic 7.1.
1. Semantic Web as a new model of
internet
information
space.
Metadata. Ontologies
2. Languages of queries to RDF
repositories
3. Logical inference
4. Provision
integrity
and
consistency
5. Agents and Services
2
3
4
5
6
10
4
2
-
4
5
2
1
-
2
5
2
1
-
2
5
2
1
-
2
5
2
1
-
2
10
4
2
-
4
5
2
1
-
2
5
2
1
-
2
10
4
2
-
4
5
2
1
-
2
6. Semantic
realization
1
Web
2
3
4
5
6
5
2
1
-
2
10
4
2
-
4
5
2
1
-
2
5
2
1
-
2
36
4
2
26
4
18
2
1
13
2
18
2
1
13
2
102
6
108
32
32
16
2
18
26
26
28
4
32
practical
Topic 7.2.
1. Web-services
metadescription
and UDDI specification
2. Advanced approach to WEBservices composition
3. Advanced approach to WEBservice discovery and selection
4. Increasing
web-services
discovery relevancy in the multiontological environment
5. Modelling results
6. The tree structure of computing
(Web-services) and information
resources based on
metadescriptions designing
Chapter 8. Methods of knowledge
based portals development
Topic 8.1.
1. Approaches
analysis
to
information systematization for
knowledge portal in fields of
science and engineering
2. Formal algebraic system of
knowledge representation for
knowledge portal in fields of
science and engineering
Topic 8.2.
1. Method of tree for complex
functional elements (sequence of
computations) forming
2. Tools of information and
computational resources
systematization and structuring for
knowledge portal in fields of
engineering
Chapter 9. Data Warehouses. Data
cubes. OLAP-systems. Data marts.
Data streams. Data mining
Topic 9.1.
1. Data Warehouses
2. Data cubes
Topic 9.2.
1. OLAP-systems
2. Data marts. The data streams
3. Data mining
Overall
Test
Overall (hours)
4. Lectures
№
з/п
Topics and a list of the main issues
Chapter 1. The main research directions in artificial intelligence. Discrete mathematics
and mathematical logic particulars
The main issues:
1
1. The purpose and objectives
2. Items of discrete mathematics
2.1. Sets. Algebra of sets
2.2. The theory of Boolean functions. Boolean algebra
2.3. Definition and method of the Boolean functions specifying
2.4. Disjunctive and conjunctive normal form (CNF, DNF)
2.5. Dealing with disjunctive normal forms
2.6. Method of Quine - Mac Kloska for minimal DNF finding
3. Introduction to mathematical logic
3.1. The formal models
3.2. The propositional logic
3.3. The statement and proof of theorems
3.4. Check demonstrative reasoning
3.5. Syllogisms
3.6. The logical consequence
3.7. The main theorem of inference
3.8. Reduction to normal form
3.9. Method of resolution
3.10. The other methods
3.11. Adequacy of propositional logic
4. Overview to predicate logic and logical inference
4.1. Predicates
4.2. Free and coupled variables
4.3. Interpretation
4.4. Equivalence of predicate logic
4.5. The logical inference in predicate logic
4.6. The unification algorithm
4.7. Logic programming
5. Logical inference in Prolog
6. Logical inference application for circuit analysis
7. Expert Systems
8. Examples of solving production process tasks
8.1. Principles of linguistic modelling
8.2. The general structure of the expert rules
9. Conclusions
10. References
The didactic tools: Slides on the subject and online tutorials
The tasks for out-of-class activities: to repeat and to execute in-depth analysis of the
lecture materials using the recommended literature.
Chapter 2. Introduction to the theory of fuzzy sets and fuzzy logic
The main issues:
1. Fuzzy sets. Basic terms and definitions
1.1. Properties of fuzzy sets
1.2. Operations on fuzzy sets
2. Fuzzy arithmetic
3. Fuzzy relations and their properties
4. Operations on fuzzy relations
5. Fuzzy logic
5.1. Linguistic variables
5.2. Fuzzy truth
5.3. Fuzzy logic operations
6. The fuzzy knowledge base
7. The fuzzy logical inference
7.1. Compositional rule of fuzzy inference Zade
7.2. Mamdani fuzzy logical inference
7.3. Sugeno fuzzy logical inference
7.4. Singleton fuzzy logical inference model
7.5. Fuzzy logical inference for classification tasks
7.6. Hierarchical fuzzy logical inference system
8. Conclusions
9. References
The didactic tools: Slides on the subject and online tutorials
The tasks for out-of-class activities: to repeat and to execute in-depth analysis of the
lecture materials using the recommended literature.
Chapter 3. Neural networks
2
The main issues:
1. Neural networks
1.1.
The basic concepts
1.2.
Simulating of nerve cells
1.3.
The mathematical model of a neuron
1.4.
Training of neural networks
1.5.
Method of the error back propagation
2. The adaptive network of fuzzy inference
3. Linguistic rules in decision making
3.1.
Automatic control
3.2. Situational control
3.3. Medical diagnostics
3.4. Multi-criteria evaluation
3.5. Multivariate analysis
4. Conclusions
10. References
The didactic tools: Slides on the subject and online tutorials
The tasks for out-of-class activities: to repeat and to execute in-depth analysis of the
lecture materials using the recommended literature.
Chapter 4. Genetic algorithms and examples of their application
3.
4.
The main issues:
1.
Genetic algorithms
1.1. Introduction
1.2. Genetic operators
1.3. Data presentation in the genes
1.4. Strategies for the selection and formation of a new generation
1.5. Patterns and patterns theorem
1.6. Models of genetic algorithms
1.7. Testing functions
2. Decision of the schedule designing tasks for digital flow processing by controller of
the wideband radio network systems
2.1. Evaluation of the executive effectiveness of the operations sequence for
digital stream formation
2.2. Data parallel processing for the digital stream frame formation
2.3. Approaches to the task of a sequence of operations designing for the digital
stream formation
2.4. Methods for analysing and optimizing execution sequence of operations for
frame of the digital flow formation by controller of the radio network
2.4.1. The formal model of the process for the digital stream frame
formation
2.4.2. Method of improving the efficiency of operations for the digital
stream formation
2.4.3. Analysis of requirements to the process for the digital flow frame
forming by radio network controller
2.4.4. Time of hardware resources usage optimization when digital stream
formation
2.5. The method of scheduling execution of operations for the digital stream
frame formation by radio network controller
2.5.1. Solving the problem of execution of operations scheduling in the
software block of the digital stream frame formation by radio network
controller
2.5.2. Designing the computation process mathematical models for the
digital flow frame forming
2.5.3. Analysis of the computational process for the digital flow frame
forming
2.5.4. Algorithms for execution of operations schedule designing for the
digital flow frame forming
2.5.5. On-line execution of the atomic operations scheduling optimization
2.6. The experimental results on flowcharting and diagrams analysis of execution
of operations formation sequence for digital flow by the radio network controller
on the example of LTE frame forming
2.7. Implementation of the method of increasing the digital flow forming
efficiency by radio network controller
3.
Conclusions
4.
References
The didactic tools: Slides on the subject and online tutorials
The tasks for out-of-class activities: to repeat and to execute in-depth analysis of the
lecture materials using the recommended literature.
Chapter 5. The data processing based on tree-like fuzzy knowledgebase with
combined inference scheme
5.
The main issues:
1. Description of information processing in complex organizational systems
1.1. The management issue of complex administrative systems
1.2. The structure of information flow in complex administrative systems
1.3. Systematization and analysis methods of information processing in complex
administrative systems
1.4. Data processing based on fuzzy inference subsystem
1.4.1. The fuzzy inference subsystem of the general form
1.4.2. Problems of disordered knowledge base using
1.5. Methods and ways of the membership function definition
1.6. Analysis of approaches to software modules of information processing
designing
2. An approach to information processing in complex administrative systems
2.1. Reduction of fuzzy knowledge base to the tree structure
2.2. Fuzzy inference subsystem modification
2.2.1. Modification of the fuzzy inference scheme
2.2.2. The structure of fuzzy inference subsystem modification
2.3. Presentation of the fuzzy knowledge base top as the classical logical
knowledge base
2.4. Method of data processing based on tree-fuzzy knowledge bases with a
mixed inference scheme
3. Construction and configuring the bottom of the tree-fuzzy knowledge base
3.1. Approach to development and configuring the bottom of the knowledge
base
3.2. The bottom of the knowledge base presentation as a fuzzy neural network
3.3. The structure of the fuzzy knowledge base bottom formation
3.3.1. Simplifying the fuzzy knowledge base by similarity
3.3.2. Construction the initial fuzzy knowledge base of the lower level
3.3.3. Formation of the fuzzy knowledge base bottom using genetic
programming methods
3.4. Configuring the membership functions terms of the lower level linguistic
variables
4. Development flexible software components for information processing
4.1. The main issues for development of
4.1.1. Tools of flexible software components
4.1.2. Binding to the subject area
4.1.3. Automatic construction of a database query
4.2. The experience of the practical usage of the flexible software components
of information processing
4.2.1. Formalization of the subject area “State traffic police of Ukraine”
4.2.2. Binding software module of data processing to the subject area
4.2.3. Analysis of accident and crime on the roads of Ukraine
5. Conclusions
6. References
The didactic tools: Slides on the subject and online tutorials
The tasks for out-of-class activities: to repeat and to execute in-depth analysis of the
lecture materials using the recommended literature.
Chapter 6. Metagraphs and their applications
The main issues:
6. 1. Graphs, Hypergraphs, and Metagraphs
1.1. Graphs and Data Visualization
1.2. Graph Structures
2. Metagraph Theory.
2.1. The Algebraic Structure of Metagraphs
2.1.1. Formal Representation of a Metagraph
2.1.2. The Incidence and Adjacency Matrices
2.1.3. Identifying Metapaths
2.2. Connectivity Properties of Metagraphs
2.2.1. Dominant Metapaths
2.2.2. Cutsets and Bridges
2.3. Metagraph Transformations
2.3.1. Hierarchical Abstraction Using Projection
2.3.2. The Inverse Metagraph
2.3.3. The Element Flow Metagraph
2.4. Attributed Metagraphs
2.4.1. Qualitative Attributes
2.4.2. Quantitative Attributes
2.4.3. Conditional Metagraphs. Projections in Conditional Metagraphs.
Connectivity and Redundancy
2.5. Independent Sub-Metagraphs
3. Applications of Metagraphs
3.1. Presentation fuzzy knowledge base in the form of Metagraphs
3.2. Visualization and analysis of fuzzy knowledge base correctness using
Metagraphs
3.3. Inference methods based on Metagraphs
4. Conclusions
5. References
The didactic tools: Slides on the subject and online tutorials
The tasks for out-of-class activities: to repeat and to execute in-depth analysis of the
lecture materials using the recommended literature.
Chapter 7. Semantic Web and computing and information resources metadescriptions
The main issues:
1.
Semantic Web as a new model of internet information space
1.1. Motivation of semantic approach development to describe computational
(Web-services) and information resources
1.2. Semantic Web concept
1.2.1. Semantic Web architecture
1.2.2. URI - Uniform Resource Identifier
1.2.3. Documents: Extensible Markup Language (XML)
1.2.4. Statements: The general scheme of resource description RDF, schema
RDF
7.
1.2.5. Metadata
1.3. Ontologies
1.4.Query languages to RDF repositories
1.5.Logical inference
1.6.Ensuring the integrity and consistency
1.7.Agents and Services
1.8.Practical implementation of Semantic Web
2. Web-services meta-description. UDDI specification
3. Advanced approach to Web-services composition
4. Advanced approach to Web-services discovery and selection
5. Increasing web services discovery relevancy in the multi-ontological environment
5.1. Analysis of the existing approaches
5.2. Web service similarity evaluation method
5.3. Results based on modelling
6.
Formation of the tree structure for computational (Web-services) and information
resources based on meta descriptions
7. Conclusions
8. References
The didactic tools: Slides on the subject and online tutorials
The tasks for out-of-class activities: to repeat and to execute in-depth analysis of the
lecture materials using the recommended literature.
Chapter 8. Methods of knowledge based portals development
The main issues:
1. Analysis of approach for data and knowledge systematization on portal of knowledge
1.1. The issue of systematization and structuring information for complex
information systems in fields of science and engineering
1.2. Description of features, the issue of development, approaches and models
1.3. Formal algebraic systems for knowledge portal, basic definitions and concepts
2. Formal algebraic system of scientific and engineering knowledge representation
2.1. Algebraic system model for knowledge portals in fields of science and
engineering
2.2. Operations of calculations algebra
2.3. Mediums of the data
2.4. Simple operations properties of calculations algebra
2.5. Complex operations properties of calculations algebra
3. Method of common functional elements tree forming
3.1. The approach to the complex functional elements developing
3.2. Stages of the method of common functional elements developing
8.
3.3. The method of complex workflow dynamic developing in fields of science
and engineering
4. Software tools for information and computing resources systematization and structuring
on knowledge portal in fields of science and engineering
4.1. The basic concept for software tools development
4.1.1. Software tools complex for information and computing resources
systematization and structuring on knowledge portal
4.1.2. The way of interoperability of the information and functional
elements on the knowledge portal
4.1.3. Structuring and systematization of portal knowledge
4.1.4. Software tools for dynamic workflow development
4.2. Experience of software tools practical usage
4.2.1. Portal "Strength of materials"
4.2.2. Portal “ National Antarctic Scientific Centre of Ukraine”
5. Conclusions
6. References
The didactic tools: Slides on the subject and online tutorials
The tasks for out-of-class activities: to repeat and to execute in-depth analysis of the
lecture materials using the recommended literature.
Chapter 9. Data Warehouses. Data cubes. OLAP-systems. Data marts. Data streams.
Data mining
The main issues:
9.
1. Data Warehouses
1.1. The concept of data warehouse
1.2. Comparison of on-line transaction processing system and data warehouse
Problems in the development and maintenance of the data warehouses
1.3. The architecture of the data warehouse
1.4. Information flows in a data warehouse
1.5. Tools and technologies of data warehousing
1.6. Data stores
2. Data cubes
2.1. Denormalized dimensional database
2.2. “Dimensional” methodology
2.3. Star and snowflake schemes
2.4. Data cubes, hierarchies, aggregates
2.5. Variants of data warehouses implementing
2.6. Store data formats in OLAP-cubes
3.
OLAP-systems
3.1.
OLAP tools benchmarking
3.2.
OLAP application and advantages of OLAP
3.3.
Presentation of multidimensional data
3.4.
Codd rules for OLAP tools selecting
3.5.
Technology of data processing
3.6.
Methods of data processing
4.
The data marts. The data stream
4.1.
Data marts structure developing used SQL Server Management Studio
4.2.
Setting dimensions and links
4.3.
Filling the empty data marts windows used Integration Services
4.4.
The Integration Service project creating
4.5.
Creating first level tasks flow and data flow. Second level data flow. Third
level data flow.
4.6.
Data flows to the fact table
4.7. Advanced cube settings
4.8.
The perspective creating
4.9.
Design and usage of key performance indicators
4.10. Creating the relational schema from the multidimensional cube
5.
Data processing
5.1. The purpose of analytical services usage
5.2.
Data mining models
5.3.
Data mining algorithm
5.4.
Data mining model development
5.5.
Analysis of “naive Bayes” model
6.
Conclusions
7.
References
The didactic tools: Slides on the subject and online tutorials.
The tasks for out-of-class activities: to repeat and to execute in-depth analysis of the
lecture materials using the recommended literature.
4. Practical exercises and labs
The purpose of the practical exercises and labs cycle is: the students have received basic
practical skills of using data warehouses, analytical tools for data processing, such as ROLAP,
MOLAP, learned how to develop a simple expert systems, have the skills to use the language for
statistical data processing.
№
з/п
1
Title of the practical exercises and labs
Data processing used Microsoft Visual Studio 2008
hours
6
2
3
4
5
6
Applying MS Excel as a remote client MS SQL Server 2008 and 6
MS Analysis Server 2008 via Internet
Development of a semantic network
3
Development of frame model
2
Development an expert system based on productional model
3
Processing statistics data
6
7. Homework
№
з/п
1
2
3
Title of the topic for students’ homework
Methods based on ant colony optimization
Fuzzy neural network
Ontologies and Semantic Web
hours
18
18
18
8. Reference
1. V. Korneev, A.F. Gareyev, S.V. Vasjutin, V. Reich Databases. Intelligent processing of information.
- Moscow, Publishing House "Nolidg" , 2000, 260 p.
2. T.N. Baydyk Neural networks and artificial intelligence tasks, Kiev, "Naukova Dumka", 2001, 263
p.
3. Bratko I. Prolog Programming for Artificial Intelligence: Per. from English. – Moscow, Mir, 1990 560 p.
4. Knowledge Base of Intelligent Systems \ T.A. Gavrilova, V.F. Khoroshevsky - St. Peter, 2000, 384 p.
5. Korotkii S. Neural network: basic concepts // http://www.neuropower.de/rus/books/index.html
6. Connolly, T., Begg, K., Strachan A. Databases: Design, implementation and maintenance. Theory and
Practice, 2nd ed .: Trans. with Engl .: Uch.pos. - M .: Publishing House, "Williams", 2000.- 1120 p.
7. Rothstein A.P. Intelligent identification technology, fuzzy sets, genetic algorithms, neural networks. Vinnica: The UNIVERSUM, 1999. -320 p.
8. Subbotin S.A. Knowledge presentation and processing in the artificial intelligence and decision support
systems: Handbook. - Zaporozhye, Zaporozhye National Technical University, 2008. - 341 p.
9. The slides to present lectures contain additional references after every topic.
Download