● September 2005 AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany Connectionist Knowledge Representation and Reasoning SCREECH (Part I) Neural Networks and Structured Knowledge Fodor, Pylyshin: What’s deeply wrong with Connectionist architecture is this: Because it acknowledges neither syntactic nor semantic structure in mental representations, it perforce treats them not as a generated set but as a list. [Connectionism and Cognitive Architecture 88] Our claim: state-of-the-art connectionist architectures do adequately deal with structures! Slide 1 ● September 2005 Tutorial Outline (Part I) Neural networks and structured knowledge AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany • Feedforward networks – The good old days: KBANN and co. – Useful: neurofuzzy systems, data mining pipeline – State of the art: structure kernels • Recurrent networks – The basics: Partially recurrent networks – Lots of theory: Principled capacity and limitations – To do: Challenges • Recursive data structures – The general idea: recursive distributed representations – One breakthrough: recursive networks – Going on: towards more complex structures Slide 2 ● September 2005 Tutorial Outline (Part I) Neural networks and structured knowledge AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany • Feedforward networks – The good old days: KBANN and co. – Useful: neurofuzzy systems, data mining pipeline – State of the art: structure kernels • Recurrent networks – The basics: Partially recurrent networks – Lots of theory: Principled capacity and limitations – To do: Challenges • Recursive data structures – The general idea: recursive distributed representations – One breakthrough: recursive networks – Going on: towards more complex structures Slide 3 ● September 2005 The good old days – KBANN and co. AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany feedforward neural network y x neuron x1 1. black box 2. distributed representation 3. connection to rules for symbolic I/O ? w1 w x2 θ 2 … fw :ℝn ℝo wn xn σ(wtx - θ) σ(t) = sgd(t) = (1+e-t)-1 Slide 4 ● September 2005 AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany The good old days – KBANN and co. Knowledge Based Artificial Neural Networks [Towell/Shavlik, AI 94]: • start with a network which represents known rules • train using additional data • extract a set of symbolic rules after training data (partial) rules train (complete) rules Slide 5 ● September 2005 AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany The good old days – KBANN and co. (partial) rules propositional acyclic Horn clauses such as A :- B,C,D E :- A FNN with one neuron per boolean variable B 1 2.5 0.5 1 1 C 1 A E D Slide 6 ● September 2005 The good old days – KBANN and co. AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany data train use some form of backpropagation, add a penalty to the error e.g. for changing the weights 1. The initial network biases the training result, but 2. There is no guarantee that the initial rules are preserved 3. There is no guarantee that the hidden neurons maintain their semantic Slide 7 ● September 2005 The good old days – KBANN and co. AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany (complete) rules 1. There is no exact direct correspondence of a neuron and a single rule, although each neuron (and the overall mapping) can be approximated by a set of rules arbitrarily well 2. It is NP complete to find a minimum logical description for a trained network [Golea, AISB'96] 3. Therefore, a couple of different rule extraction algorithms have been proposed, and this is still a topic of ongoing research Slide 8 ● September 2005 AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany The good old days – KBANN and co. (complete) rules decompositional approach: rules pedagogical approach: x y rules Slide 9 ● September 2005 The good old days – KBANN and co. AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany Decompositional approaches: • subset algorithm, MofN algorithm describe single neurons by sets of active predecessors [Craven/Shavlik, 94] • local activation functions (RBF like) allow an approximate direct description of single neurons [Andrews/Geva, 96] • MLP2LN biases the weights towards 0/-1/1 during training and can then extract exact rules [Duch et al., 01] • prototype based networks can be decomposed along relevant input dimensions by decision tree nodes [Hammer et al., 02] Observation: • usually some variation of if-then rules is achieved • small rule sets are only achieved if further constraints guarantee that single weights/neurons have a meaning • tradeoff between accuracy and size of the description Slide 10 ● September 2005 The good old days – KBANN and co. AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany Pedagogical approaches: • extraction of conjunctive rules by extensive search [Saito/Nakano 88] • • • • interval propagation [Gallant 93, Thrun 95] extraction by minimum separation [Tickle/Diderich, 94] extraction of decision trees [Craven/Shavlik, 94] evolutionary approaches [Markovska, 05] Observation: • usually some variation of if-then rules is achieved • symbolic rule induction required with a little (or a bit more) help of a neural network Slide 11 ● September 2005 The good old days – KBANN and co. AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany Where is this good for? • Nobody uses FNNs these days • Insertion of prior knowledge might be valuable. But efficient training algorithms allow to substitute this by additional training data (generated via rules) • Validation of the network output might be valuable, but there exist alternative (good) guarantees from statistical learning theory • If-then rules are not very interesting since there exist good symbolic learners for learning propositional rules for classification • • • • Propositional rule insertion/extraction is often an essential part of more complex rule insertion/extraction mechanisms Demonstrates a key problem, different modes of representation, in a very nice way Some people e.g. in the medical domain also want an explanation for a classification There are at least two application domains where if-then rules are very interesting and not so easy to learn: fuzzy-control and unsupervised data mining Slide 12 ● September 2005 Tutorial Outline (Part I) Neural networks and structured knowledge AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany • Feedforward networks – The good old days: KBANN and co. – Useful: neurofuzzy systems, data mining pipeline – State of the art: structure kernels • Recurrent networks – The basics: Partially recurrent networks – Lots of theory: Principled capacity and limitations – To do: Challenges • Recursive data structures – The general idea: recursive distributed representations – One breakthrough: recursive networks – Going on: towards more complex structures Slide 13 ● September 2005 AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany Useful: neurofuzzy systems process input observation control Fuzzy control: if (observation є FMI) then (control є FMO) Slide 14 ● September 2005 AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany Useful: neurofuzzy systems Fuzzy control: if (observation є FMI) then (control є FMO) Neurofuzzy control: observation control implementation of the fuzzy-if-then rules Benefit: the form of the fuzzy rules (i.e. neural architecture) and the shape of the fuzzy sets (i.e. neural weights) can be learned from data! Slide 15 ● September 2005 Useful: neurofuzzy systems AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany • NEFCON implements Mamdani control [Nauck/Klawonn/Kruse, 94] • ANFIS implements Takagi-Sugeno control [Jang, 93] • and many other … • Learning – of rules: evolutionary or clustering – of fuzzy set parameters: reinforcement learning or some form of Hebbian learning Slide 16 ● September 2005 AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany Useful: data mining pipeline Task: describe given inputs (no class information) by ifthen rules • Data mining with emergent SOM, clustering, and rule extraction [Ultsch, 91] rules Slide 17 ● September 2005 Tutorial Outline (Part I) Neural networks and structured knowledge AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany • Feedforward networks – The good old days: KBANN and co. – Useful: neurofuzzy systems, data mining pipeline – State of the art: structure kernels • Recurrent networks – The basics: Partially recurrent networks – Lots of theory: Principled capacity and limitations – To do: Challenges • Recursive data structures – The general idea: recursive distributed representations – One breakthrough: recursive networks – Going on: towards more complex structures Slide 18 ● September 2005 AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany State of the art: structure kernels sets, sequences, tree structures, graph structures kernel k(x,x’) SVM data just compute pairwise distances for this complex data using structure information Slide 19 ● September 2005 State of the art: structure kernels AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany • Closure properties of kernels [Haussler, Watkins] • Principled problems for complex structures: computing informative graph kernels is at least as hard as graph isomorphism [Gärtner] • Several promising proposals - taxonomy [Gärtner]: count common substructures syntax semantic derived from local transformations derived from a probabilistic model Slide 20 ● September 2005 AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany State of the art: structure kernels Count common substructures: GA AG AT GAGAGA GAT 3 1 2 0 0 1 3 Efficient computation: dynamic programming suffix trees locality improved kernel [Sonnenburg et al.], bag of words [Joachims] string kernel [Lodhi et al.], spectrum kernel [Leslie et al.] word-sequence kernel [Cancedda et al.] convolution kernels for language [Collins/Duffy, Kashima/Koyanagi, Suzuki et al.] kernels for relational learning [Zelenko et al.,Cumby/Roth, Gärtner et al.] graph kernels based on paths or subtrees [Gärtner et al.,Kashima et al.] kernels for prolog trees based on similar symbols [Passerini/Frasconi/deRaedt] Slide 21 State of the art: structure kernels ● September 2005 AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany Derived from a probabilistic model: describe by probabilistic model P(x) compare characteristics of P(x) Fisher kernel [Jaakkola et al., Karchin et al., Pavlidis et al., Smith/Gales, Sonnenburg et al., Siolas et al.] tangent vector of log odds [Tsuda et al.] marginalized kernels [Tsuda et al., Kashima et al.] kernel of Gaussian models [Moreno et al., Kondor/Jebara] Slide 22 State of the art: structure kernels ● September 2005 AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany Derived from local transformations: is similar to expand to a global kernel local neighborhood, generator H diffusion kernel [Kondor/Lafferty, Lafferty/Lebanon, Vert/Kanehisa] Slide 23 ● September 2005 State of the art: structure kernels AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany • Intelligent preprocessing (kernel extraction) allows an adequate integration of semantic/syntactic structure information • This can be combined with state of the art neural methods such as SVM • Very promising results for – Classification of documents, text [Duffy, Leslie, Lodhi, …] – Detecting remote homologies for genomic sequences and further problems in genome analysis [Haussler, Sonnenburg, Vert, …] – Quantitative structure activity relationship in chemistry [Baldi et al.] Slide 24 ● September 2005 Conclusions: feedforward networks AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany • propositional rule insertion and extraction (to some extend) are possible • useful for neurofuzzy systems, data mining • structure-based kernel extraction followed by learning with SVM yields state of the art results but: sequential instead of fully integrated neurosymbolic approach • FNNs itself are restricted to flat data which can be processed in one shot. No recurrence Slide 25 ● September 2005 Tutorial Outline (Part I) Neural networks and structured knowledge AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany • Feedforward networks – The good old days: KBANN and co. – Useful: neurofuzzy systems, data mining pipeline – State of the art: structure kernels • Recurrent networks – The basics: partially recurrent networks – Lots of theory: principled capacity and limitations – To do: challenges • Recursive data structures – The general idea: recursive distributed representations – One breakthrough: recursive networks – Going on: towards more complex structures Slide 26 ● September 2005 The basics : partially recurrent networks AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany [Elman, Finding structure in time, CogSci 90] very natural architecture for processing speech/temporal signals/control/robotics 1. can process time series of arbitrary length 2. interesting for speech processing [see e.g. Kremer, xt+1 = f(xt,It) 02] 3. training using a variation of backpropagation [see e.g. Pearlmutter, 95] Slide 27 ● September 2005 Tutorial Outline (Part I) Neural networks and structured knowledge AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany • Feedforward networks – The good old days: KBANN and co. – Useful: neurofuzzy systems, data mining pipeline – State of the art: structure kernels • Recurrent networks – The basics: Partially recurrent networks – Lots of theory: Principled capacity and limitations – To do: Challenges • Recursive data structures – The general idea: recursive distributed representations – One breakthrough: recursive networks – Going on: towards more complex structures Slide 28 Lots of theory: principled capacity and limitations ● September 2005 AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany RNNs and finite automata [Omlin/Giles, 96] input output state dynamics of the transition function of a DFA Slide 29 ● September 2005 Lots of theory: principled capacity and limitations AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany DFA RNN unary input unary state representation implement (approximate) the boolean formula corresponding to the state transition within a two-layer network RNNs can exactly simulate finite automata Slide 30 ● September 2005 Lots of theory: principled capacity and limitations AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany RNN DFA unary input in general: distributed state representation cluster into disjoint subsets corresponding to states and observe their behavior approximate description approximate extraction of automata rules is possible Slide 31 ● September 2005 Lots of theory: principled capacity and limitations AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany The principled capacity of RNNs can be characterized exactly: RNNs with arbitrary weights = non uniform Boolean circuits (super Turing capability) [Siegelmann/Sontag] RNNs with rational weights = Turing machines [Siegelmann/Sontag] RNNs with limited noise = finite state automata [Omlin/Giles, Maass/Orponen] RNNs with small weights or Gaussian noise = finite memory models [Hammer/Tino, Maass/Sontag] Slide 32 ● September 2005 Lots of theory: principled capacity and limitations AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany However, learning might be difficult: • gradient based learning schemes face the problem of long-termdependencies [Bengio/Frasconi] error • RNNs are not PAC-learnable (infinite VC-dim), only distribution dependent bounds can be derived [Hammer] tatatatatatatatatatatata tatatatatata tatata • there exist only few general guarantees for the long term behavior of RNNs, e.g. stability [Suykens, Steil, …] • tatatatatatatatatatatatatatatatatatatatatatatatatata Slide 33 ● September 2005 Lots of theory: principled capacity and limitations AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany RNNs • naturally process time series • incorporate plausible regularization such as a bias towards finite memory models • have sufficient power for interesting dynamics (context free, context sensitive; arbitrary attractors and chaotic behavior) but: • training is difficult • only limited guarantees for the long term behavior and generalization ability symbolic description/knowledge can provide solutions Slide 34 ● September 2005 Lots of theory: principled capacity and limitations AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany recurrent symbolic system RNN correspondence? e.g. attractor/repellor for counting anbncn real numbers; iterated function systems give rise to fractals/attractors/chaos, implicit memory discrete states; crisp boolean function on the states + explicit memory Slide 35 ● September 2005 Tutorial Outline (Part I) Neural networks and structured knowledge AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany • Feedforward networks – The good old days: KBANN and co. – Useful: neurofuzzy systems, data mining pipeline – State of the art: structure kernels • Recurrent networks – The basics: Partially recurrent networks – Lots of theory: Principled capacity and limitations – To do: Challenges • Recursive data structures – The general idea: recursive distributed representations – One breakthrough: recursive networks – Going on: towards more complex structures Slide 36 ● September 2005 To do: challenges AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany Training RNNs: • search for appropriate regularizations inspired by a focus on specific functionalities – architecture (e.g. local), weights (e.g. bounded), activation function (e.g. linear), cost term (e.g. additional penalties) [Hochreiter,Boden,Steil,Kremer…] • insertion of prior knowledge – finite automata and beyond (e.g. context free/sensitive, specific dynamical patterns/attractors) [Omlin,Croog,…] Long term behavior: • enforce appropriate constraints while training • investigate the dynamics of RNNs – rule extraction, investigation of attractors, relating dynamics and symbolic processing [Omlin,Pasemann,Haschke,Rodriguez,Tino,…] Slide 37 To do: challenges Some further issues: • processing spatial data ● September 2005 AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 [bicausal networks, Pollastri et al., contextual RCC, Micheli et al.] • unsupervised processing [TKM, Chappell/Taylor, RecSOM, Voegtlin, SOMSD, Sperduti et al., MSOM, Hammer et al., general formulation, Hammer et al.] x1,x2,x3,x4,… Slide 38 ● September 2005 Conclusions: recurrent networks AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany • the capacity of RNNs is well understood and promising e.g. for natural language processing, control, … • recurrence of symbolic systems has a natural counterpart in the recurrence of RNNs • training and generalization faces problems which could be solved by hybrid systems • discrete dynamics with explicit memory versus realvalued iterated function systems • sequences are nice, but not enough Slide 39 ● September 2005 Tutorial Outline (Part I) Neural networks and structured knowledge AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany • Feedforward networks – The good old days: KBANN and co. – Useful: neurofuzzy systems, data mining pipeline – State of the art: structure kernels • Recurrent networks – The basics: Partially recurrent networks – Lots of theory: Principled capacity and limitations – To do: Challenges • Recursive data structures – The general idea: recursive distributed representations – One breakthrough: recursive networks – Going on: towards more complex structures Slide 40 The general idea: recursive distributed representations ● September 2005 AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany How to turn tree structures/acyclic graphs into a connectionist representation? vector Slide 41 ● September 2005 The general idea: recursive distributed representations AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany recursion! vector inp. output cont. cont. f:ℝixℝcxℝcℝc yields fenc where fenc(ξ)=0 fenc(a(l,r))= f(a,fenc(l),fenc(r)) Slide 42 The general idea: recursive distributed representations label left image right cont. cont. encoding inp. f:ℝn+2cℝc g:ℝcℝo ● September 2005 AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany h:ℝoℝn+2 encoding: fenc:(ℝn)2*ℝc fenc(ξ) = 0 fenc(a(l,r)) = f(a,fenc(l),fenc(r)) decoding: hdec: ℝo (ℝn)2* hdec(0) = ξ hdec(x) = h0(x) (hdec(h1(x)), hdec(h2(x))) o Slide 43 ● September 2005 AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany The general idea: recursive distributed representations • recursive distributed description [Hinton,90] – general idea without concrete implementation • • • tensor construction [Smolensky, 90] label inp. code cont. • left cont. right – encoding/decoding given by (a,b,c) a⊗b⊗c – increasing dimensionality Holographic reduced representation [Plate, 95] – circular correlation/convolution – fixed encoding/decoding with fixed dimensionality (but potential loss of information) – necessity of chunking or clean-up for decoding Binary spatter codes [Kanerva, 96] – binary operations, fixed dimensionality, potential loss – necessity of chunking or clean-up for decoding RAAM [Pollack,90], LRAAM [Sperduti, 94] – trainable networks, trained for the identity, fixed dimensionality – encoding optimized for the given training set Slide 44 ● September 2005 The general idea: recursive distributed representations AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany Nevertheless: results not promising Theorem [Hammer]: • There exists a fixed size neural network which can uniquely encode tree structures of arbitrary depth with discrete labels • For every code, decoding of all trees up to height T requires Ω(2T) neurons for sigmoidal networks encoding seems possible, but no fixed size architecture exists for decoding Slide 45 ● September 2005 Tutorial Outline (Part I) Neural networks and structured knowledge AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany • Feedforward networks – The good old days: KBANN and co. – Useful: neurofuzzy systems, data mining pipeline – State of the art: structure kernels • Recurrent networks – The basics: Partially recurrent networks – Lots of theory: Principled capacity and limitations – To do: Challenges • Recursive data structures – The general idea: recursive distributed representations – One breakthrough: recursive networks – Going on: towards more complex structures Slide 46 ● September 2005 AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany One breakthrough: recursive networks Recursive networks [Goller/Küchler, 96]: • do not use decoding • combine encoding and mapping • train this combination directly for the given task with backpropagation through structure efficient data and problem adapted encoding is learned encoding transformation inp. output cont. cont. y Slide 47 ● September 2005 One breakthrough: recursive networks AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany Applications: • term classification [Goller, Küchler, 1996] • automated theorem proving [Goller, 1997] • learning tree automata [Küchler, 1998] • QSAR/QSPR problems [Schmitt, Goller, 1998; Bianucci, Micheli, • • • • • • • Sperduti, Starita, 2000; Vullo, Frasconi, 2003] logo recognition, image processing [Costa, Frasconi, Soda, 1999, Bianchini et al. 2005] natural language parsing [Costa, Frasconi, Sturt, Lombardo, Soda, 2000,2005] document classification [Diligenti, Frasconi, Gori, 2001] fingerprint classification [Yao, Marcialis, Roli, Frasconi, Pontil, 2001] prediction of contact maps [Baldi, Frasconi, Pollastri, Vullo, 2002] protein secondary structure prediction [Frasconi et al., 2005] … Slide 48 ● September 2005 AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany One breakthrough: recursive networks Desired: approximation completeness - for every (reasonable) function f and ε>0 exists a RecNN which approximates f up to ε (with appropriate distance measure) Approximation properties can be measured in several ways: given f, ε, probability P, data points xi, find fw such that • P(x | |f(x)-fw(x)| > ε ) small (L1 norm) or • |f(x)-fw(x)| < ε for all x (max norm) or • f(xi) = fw(xi) for all xi (interpolation of points) Slide 49 ● September 2005 One breakthrough: recursive networks AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany Approximation properties for RecNNs and tree-structured data: … capable of approximating every continuous function in maxnorm with restricted height, every measurable function in L1norm (σ:squashing) [Hammer] … capable of interpolating every set {f(x1),…,f(xm)} with O(m2) neurons (σ:squashing, C2 in environment of t s.t. σ‘‘(t)≠0) [Hammer] … can approximate every tree automaton for arbitrary large inputs [Küchler] ... cannot approximate every f:{1}2*{0,1} (for realistic σ) [Hammer] … fairly good results - 3:1 Slide 50 ● September 2005 Tutorial Outline (Part I) Neural networks and structured knowledge AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany • Feedforward networks – The good old days: KBANN and co. – Useful: neurofuzzy systems, data mining pipeline – State of the art: structure kernels • Recurrent networks – The basics: Partially recurrent networks – Lots of theory: Principled capacity and limitations – To do: Challenges • Recursive data structures – The general idea: recursive distributed representations – One breakthrough: recursive networks – Going on: towards more complex structures Slide 51 ● September 2005 AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany Going on: towards more complex structures More general trees: arbitrary number of not positioned children fenc=f(1/|ch| · ∑ch w · fenc(ch) · labeledge,label) approximation complete for appropriate edge labels [Bianchini et al. 2005] Slide 52 ● September 2005 AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany Going on: towards more complex structures Planar graphs: … [Baldi,Frasconi,…,2002] … Slide 53 ● September 2005 Going on: towards more complex structures AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany Acyclic graphs: q-1 q+1 q-1 q-1 Contextual cascade correlation [Micheli,Sperduti,03] Approximation complete (under a mild structural restriction) even for structural transduction [Hammer,Micheli,Sperduti,05] Slide 54 ● September 2005 Going on: towards more complex structures AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany Cyclic graphs: neighbor [Micheli,05] Slide 55 ● September 2005 Conclusions: recursive networks AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany • Very promising neural architectures for direct processing of tree structures • Successful applications and mathematical background • Connections to symbolic mechanisms (tree automata) • Extensions to more complex structures (graphs) are under development • Only few approaches which achieve structured outputs Slide 56 ● September 2005 Tutorial Outline (Part I) Neural networks and structured knowledge AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany • Feedforward networks – The good old days: KBANN and co. – Useful: neurofuzzy systems, data mining pipeline – State of the art: structure kernels • Recurrent networks – The basics: Partially recurrent networks – Lots of theory: Principled capacity and limitations – To do: Challenges • Recursive data structures – The general idea: recursive distributed representations – One breakthrough: recursive networks – Going on: towards more complex structures Slide 57 ● September 2005 Conclusions (Part I) AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany Overview literature: • FNN and rules: Duch,Setiono,Zurada,Computational intelligence methods • • • • • • for understanding of data, Proc. of the IEEE 92(5):771- 805, 2004 Structure kernels: Gärtner,Lloyd,Flach, Kernels and distances for structured data, Machine Learning, 57, 2004. (new overview is forthcoming) RNNs: Hammer,Steil, Perspectives on learning with recurrent networks, in: Verleysen, ESANN'2002, D-side publications, 357-368, 2002 RNNs and rules: Jacobsson, Rule extraction from recurrent neural networks: a taxonomy and review, Neural Computation, 17:1223-1263, 2005 Recursive representations: Hammer, Perspectives on Learning Symbolic Data with Connectionistic Systems, in: Kühn, Menzel, Menzel, Ratsch, Richter, Stamatescu, Adaptivity and Learning, 141-160, Springer, 2003. Recursive networks: Frasconi,Gori,Sperduti, A General Framework for Adaptive Processing of Data Structures, IEEE Transactions on Neural Networks, 9(5):768-786,1998 Neural networks and structures: Hammer,Jain, Neural methods for nonstandard data, in: Verleysen, ESANN'2004, D-side publications, 281-292, 2004 Slide 58 ● September 2005 Conclusions (Part I) AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany • There exist networks which can directly deal with structures (sequences, trees, graphs) with good success: kernel machines, recurrent and recursive networks • Efficient training algorithms and theoretical foundations exist • (Loose) connections to symbolic processing have been established and indicate benefits Now: towards strong connections PART II: Logic and neural networks Slide 59 ● September 2005 AIFB Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany Slide 60