Introduction - VideoLectures.NET

advertisement
Grammatical inference:
techniques and
algorithms
Colin de la Higuera
Erice 2005, the Analysis of Patterns. Grammatical Inference
1
1
Acknowledgements
• Laurent Miclet, Tim Oates, Jose
Oncina,
Rafael
Carrasco,
Paco
Casacuberta,
Pedro
Cruz,
Rémi
Eyraud, Philippe Ezequel, Henning
Fernau,
Jean-Christophe
Janodet,
Thierry Murgue, Frédéric Tantini,
Franck Thollard, Enrique Vidal,...
• … and a lot of other people to whom
I am grateful
Erice 2005, the Analysis of Patterns. Grammatical Inference
2
2
Outline
1
2
3
4
5
An introductory example
About grammatical inference
Some specificities of the task
Some techniques and algorithms
Open issues and questions
Erice 2005, the Analysis of Patterns. Grammatical Inference
3
3
1 How do we learn languages?
A very simple example
Erice 2005, the Analysis of Patterns. Grammatical Inference
4
4
The problem:
• You are in an unknown city
and have to eat.
• You therefore go to some
selected restaurants.
• Your goal is therefore to
build a model of the city (a
map).
Erice 2005, the Analysis of Patterns. Grammatical Inference
5
5
The data
• Up Down Right Left Left 
Restaurant
• Down Down Right  Not a
restaurant
• Left Down  Restaurant
Erice 2005, the Analysis of Patterns. Grammatical Inference
6
6
Hopefully something like this:
u,r
N
u
d
R
d,l
u
N
r
R
Erice 2005, the Analysis of Patterns. Grammatical Inference
d
7
7
N
d
u
N
d
u
R
R
d
N
u
N
R
u
R
u
d
d
r
R
d
N
u
Erice 2005, the Analysis of Patterns. Grammatical Inference
8
8
Further arguments (1)
• How did we get hold of the
data?
– Random walks
– Following someone
• someone knowledgeable
• Someone trying to lose us
• Someone on a diet
– Exploring
Erice 2005, the Analysis of Patterns. Grammatical Inference
9
9
Further arguments (2)
• Can
we
not
have
better
information (for example the
names of the restaurants)?
• But then we may only have the
information about the routes
to restaurants (not to the
“non restaurants”)…
Erice 2005, the Analysis of Patterns. Grammatical Inference
10
10
Further arguments (3)
What if instead of getting
the information “Elimo” or
“restaurant”,
I
get
the
information “good meal” or
“7/10”?
Reinforcement learning: POMDP
Erice 2005, the Analysis of Patterns. Grammatical Inference
11
11
Further arguments (4)
• Where is my algorithm to
learn these things?
• Perhaps
should
I
consider
several algorithms for the
different types of data?
Erice 2005, the Analysis of Patterns. Grammatical Inference
12
12
Further arguments (5)
• What can I
result?
• What can I
algorithm?
say
about
the
say
about
the
Erice 2005, the Analysis of Patterns. Grammatical Inference
13
13
Further arguments (6)
• What if I want something
richer than an automaton?
– A context-free grammar
– A transducer
– A tree automaton…
Erice 2005, the Analysis of Patterns. Grammatical Inference
14
14
Further arguments (7)
• Why do I want something as rich
as an automaton?
• What about
– A simple pattern?
– Some SVM obtained from features over
the strings?
– A neural network that would allow me
to know if some path will bring me or
not to a restaurant, with high
probability?
Erice 2005, the Analysis of Patterns. Grammatical Inference
15
15
Our goal/idea
• Old Greeks:
A whole is more than the sum of
all parts
• Gestalt theory
A whole is different than the
sum of all parts
Erice 2005, the Analysis of Patterns. Grammatical Inference
16
16
Better said
• There are cases where the
data cannot be analyzed by
considering it in bits
• There
are
cases
where
intelligibility
of
the
pattern is important
Erice 2005, the Analysis of Patterns. Grammatical Inference
17
17
What do people know about formal language
theory?
Nothing
Erice 2005, the Analysis of Patterns. Grammatical Inference
Lots
18
18
A small reminder on formal language theory
• Chomsky hierarchy
• + and – of grammars
Erice 2005, the Analysis of Patterns. Grammatical Inference
19
19
A crash course in Formal language theory
• Symbols
• Strings
• Languages
• Chomsky hierarchy
• Stochastic languages
Erice 2005, the Analysis of Patterns. Grammatical Inference
20
20
Symbols
are taken from some alphabet 
Strings
are sequences of symbols from 
Erice 2005, the Analysis of Patterns. Grammatical Inference
21
21
Languages
are sets of strings over 
Languages
are subsets of *
Erice 2005, the Analysis of Patterns. Grammatical Inference
22
22
Special languages
• Are
recognised
by
finite
state automata
• Are generated by grammars
Erice 2005, the Analysis of Patterns. Grammatical Inference
23
23
b
a
a
a
b
b
DFA: Deterministic Finite State Automaton
Erice 2005, the Analysis of Patterns. Grammatical Inference
24
24
b
a
a
a
b
b
ababL
Erice 2005, the Analysis of Patterns. Grammatical Inference
25
25
What is a context free grammar?
A
4-tuple
that:
(Σ,
S,
V,
P)
such
– Σ is the alphabet;
– V is a finite set of non
terminals;
– S is the start symbol;
– P  V  (VΣ)* is a finite set
of rules.
Erice 2005, the Analysis of Patterns. Grammatical Inference
26
26
Example of a grammar
The Dyck1 grammar
– (Σ,
–Σ =
–V =
–P =
S, V, P)
{a, b}
{S}
{S  aSbS, S   }
Erice 2005, the Analysis of Patterns. Grammatical Inference
27
27
Derivations and derivation trees
S  aSbS
 aaSbSbS
 aabSbS
 aabbS
 aabb
S
a
a
S
b
S
b

Erice 2005, the Analysis of Patterns. Grammatical Inference
S
S


28
28
Chomsky Hierarchy
• Level
• Level
• Level
• Level
0:
1:
2:
3:
no restriction
context-sensitive
context-free
regular
Erice 2005, the Analysis of Patterns. Grammatical Inference
29
29
Chomsky Hierarchy
• Level 0: Whatever Turing machines
can do
• Level 1:
– {anbncn: n }
– {anbmcndm : n,m }
– {uu: u*}
• Level 2: context-free
– {anbn: n }
– brackets
• Level 3: regular
– Regular expressions (GREP)
Erice 2005, the Analysis of Patterns. Grammatical Inference
30
30
The membership problem
• Level
• Level
• Level
• Level
0:
1:
2:
3:
undecidable
decidable
polynomial
linear
Erice 2005, the Analysis of Patterns. Grammatical Inference
31
31
The equivalence problem
• Level 0: undecidable
• Level 1: undecidable
• Level 2: undecidable
• Level 3: Polynomial only when
the representation is DFA.
Erice 2005, the Analysis of Patterns. Grammatical Inference
32
32
1
2
a
b
1
2
1
2
1
3
1
4
a
a
1
2
b
3
4
b
2
3
PFA: Probabilistic Finite
(state) Automaton
Erice 2005, the Analysis of Patterns. Grammatical Inference
33
33
0.1
b
a
0.9
0.7
a
0.35
0.7
b
0.3
a
0.3
b
0.65
DPFA: Deterministic Probabilistic
Finite (state) Automaton
Erice 2005, the Analysis of Patterns. Grammatical Inference
34
34
What is nice with grammars?
• Compact representation
• Recursivity
• Says how a string belongs,
not just if it belongs
• Graphical
representations
(automata, parse trees)
Erice 2005, the Analysis of Patterns. Grammatical Inference
35
35
What is not so nice with grammars?
• Even the easiest class (level 3)
contains SAT, Boolean functions,
parity functions…
• Noise is very harmful:
– Think about putting edit noise to
language {w: |w|a=0[2]|w|b=0[2]}
Erice 2005, the Analysis of Patterns. Grammatical Inference
36
36
2 Specificities of grammatical
inference
Grammatical
inference
consists
(roughly) in finding the (a)
grammar or automaton that has
produced a given set of strings
(sequences,
trees,
terms,
graphs).
Erice 2005, the Analysis of Patterns. Grammatical Inference
37
37
The field
Inductive Inference
Pattern Recognition
Machine Learning
Grammatical Inference
Computational linguistics
Computational biology
Erice 2005, the Analysis of Patterns. Grammatical Inference
Web technologies
38
38
The data
• Strings, trees, terms, graphs
• Structural objects
• Basically the same gap of
information as in programming
between tables/arrays and data
structures
Erice 2005, the Analysis of Patterns. Grammatical Inference
39
39
Alternatives to grammatical inference
• 2 steps:
– Extract
features
from
the
strings
– Use a very good method over n.
Erice 2005, the Analysis of Patterns. Grammatical Inference
40
40
Examples of strings
A
string
in
Gaelic
translation to English:
and
its
• Tha
thu cho duaichnidh ri èarr
àirde de a’ coisich deas damh
•You are as ugly as the north end
of a southward traveling ox
Erice 2005, the Analysis of Patterns. Grammatical Inference
41
41
Erice 2005, the Analysis of Patterns. Grammatical Inference
42
42
Erice 2005, the Analysis of Patterns. Grammatical Inference
43
43
>A BAC=41M14 LIBRARY=CITB_978_SKB
AAGCTTATTCAATAGTTTATTAAACAGCTTCTTAAATAGGATATAAGGCAGTGCCATGTA
GTGGATAAAAGTAATAATCATTATAATATTAAGAACTAATACATACTGAACACTTTCAAT
GGCACTTTACATGCACGGTCCCTTTAATCCTGAAAAAATGCTATTGCCATCTTTATTTCA
GAGACCAGGGTGCTAAGGCTTGAGAGTGAAGCCACTTTCCCCAAGCTCACACAGCAAAGA
CACGGGGACACCAGGACTCCATCTACTGCAGGTTGTCTGACTGGGAACCCCCATGCACCT
GGCAGGTGACAGAAATAGGAGGCATGTGCTGGGTTTGGAAGAGACACCTGGTGGGAGAGG
GCCCTGTGGAGCCAGATGGGGCTGAAAACAAATGTTGAATGCAAGAAAAGTCGAGTTCCA
GGGGCATTACATGCAGCAGGATATGCTTTTTAGAAAAAGTCCAAAAACACTAAACTTCAA
CAATATGTTCTTTTGGCTTGCATTTGTGTATAACCGTAATTAAAAAGCAAGGGGACAACA
CACAGTAGATTCAGGATAGGGGTCCCCTCTAGAAAGAAGGAGAAGGGGCAGGAGACAGGA
TGGGGAGGAGCACATAAGTAGATGTAAATTGCTGCTAATTTTTCTAGTCCTTGGTTTGAA
TGATAGGTTCATCAAGGGTCCATTACAAAAACATGTGTTAAGTTTTTTAAAAATATAATA
AAGGAGCCAGGTGTAGTTTGTCTTGAACCACAGTTATGAAAAAAATTCCAACTTTGTGCA
TCCAAGGACCAGATTTTTTTTAAAATAAAGGATAAAAGGAATAAGAAATGAACAGCCAAG
TATTCACTATCAAATTTGAGGAATAATAGCCTGGCCAACATGGTGAAACTCCATCTCTAC
TAAAAATACAAAAATTAGCCAGGTGTGGTGGCTCATGCCTGTAGTCCCAGCTACTTGCGA
GGCTGAGGCAGGCTGAGAATCTCTTGAACCCAGGAAGTAGAGGTTGCAGTAGGCCAAGAT
GGCGCCACTGCACTCCAGCCTGGGTGACAGAGCAAGACCCTATGTCCAAAAAAAAAAAAA
AAAAAAAGGAAAAGAAAAAGAAAGAAAACAGTGTATATATAGTATATAGCTGAAGCTCCC
TGTGTACCCATCCCCAATTCCATTTCCCTTTTTTGTCCCAGAGAACACCCCATTCCTGAC
TAGTGTTTTATGTTCCTTTGCTTCTCTTTTTAAAAACTTCAATGCACACATATGCATCCA
TGAACAACAGATAGTGGTTTTTGCATGACCTGAAACATTAATGAAATTGTATGATTCTAT
Erice 2005, the Analysis of Patterns. Grammatical Inference
44
44
Erice 2005, the Analysis of Patterns. Grammatical Inference
45
45
Erice 2005, the Analysis of Patterns. Grammatical Inference
46
46
Erice 2005, the Analysis of Patterns. Grammatical Inference
47
47
<book>
<part>
<chapter>
<sect1/>
<sect1>
<orderedlist numeration="arabic">
<listitem/>
<f:fragbody/>
</orderedlist>
</sect1>
</chapter>
</part>
</book>
Erice 2005, the Analysis of Patterns. Grammatical Inference
48
48
<?xml version="1.0"?>
<?xml-stylesheet href="carmen.xsl" type="text/xsl"?>
<?cocoon-process type="xslt"?>
<!DOCTYPE pagina [
<!ELEMENT pagina (titulus?, poema)>
<!ELEMENT titulus (#PCDATA)>
<!ELEMENT auctor (praenomen, cognomen, nomen)>
<!ELEMENT praenomen (#PCDATA)>
<!ELEMENT nomen (#PCDATA)>
<!ELEMENT cognomen (#PCDATA)>
<!ELEMENT poema (versus+)>
<!ELEMENT versus (#PCDATA)>
]>
<pagina>
<titulus>Catullus II</titulus>
<auctor>
<praenomen>Gaius</praenomen>
<nomen>Valerius</nomen>
<cognomen>Catullus</cognomen>
</auctor>
Erice 2005, the Analysis of Patterns. Grammatical Inference
49
49
Erice 2005, the Analysis of Patterns. Grammatical Inference
50
50
A logic program learned by GIFT
color_blind(Arg1) :start(Arg1,X),
p11(Arg1,X).
start(X,X).
p11(Arg1,P) :- mother(M,P),p4(Arg1, M).
p4(Arg1,X) :woman(X),father(F,X),p3(Arg1,F).
p4(Arg1,X) :woman(X),mother(M,X),p4(Arg1,M).
p3(Arg1,X) :- man(X),color_blind(X).
Erice 2005, the Analysis of Patterns. Grammatical Inference
51
51
3 Hardness of the task
– One thing is to build algorithms,
another is to be able to state that
it works.
– Some questions:
–
–
–
–
Does this algorithm work?
Do I have enough learning data?
Do I need some extra bias?
Is this algorithm better than the
other?
– Is this problem easier than the other?
Erice 2005, the Analysis of Patterns. Grammatical Inference
52
52
Alternatives to answer these questions:
– Use well admitted benchmarks
– Build your own benchmarks
– Solve a real problem
– Prove things
Erice 2005, the Analysis of Patterns. Grammatical Inference
53
53
Use well admitted benchmarks
• yes: allows to compare
• no: many parameters
• problem: difficult to better
(also, in GI, not that many
about!)
Erice 2005, the Analysis of Patterns. Grammatical Inference
54
54
Build your own benchmarks
• yes: allows to progress
• no: against one-self
• problem:
one
invents
the
benchmark where one is best!
Erice 2005, the Analysis of Patterns. Grammatical Inference
55
55
Solve a real problem
• yes: it is the final goal
• no: we don’t always know why
things work
• problem:
how
much
pre-
processing?
Erice 2005, the Analysis of Patterns. Grammatical Inference
56
56
Theory
• Because you may want to be
able to say something more
than « seems to work in
practice ».
Erice 2005, the Analysis of Patterns. Grammatical Inference
57
57
Identification in the limit
A class of languages
L
yields
Pres  X

L
The naming function
A learner
G
A class of grammars
L((f))=yields(f) f()=g() yields(f)=yields(g)
Erice 2005, the Analysis of Patterns. Grammatical Inference
58
58
L is identifiable in the limit in terms of G
from Pres iff
LL, f Pres(L)
f1 f2
fn
fi
h1 h2
hn
hi  hn
L(hi)= L
Erice 2005, the Analysis of Patterns. Grammatical Inference
59
59
No quería componer otro Quijote —lo cual es fácil— sino
el Quijote. Inútil agregar que no encaró nunca una
transcripción mecánica del original; no se proponía
copiarlo. Su admirable ambición era producir unas páginas
que coincidieran palabra por palabra y línea por línea con
las de Miguel de Cervantes.
[…]
“Mi empresa no es difícil, esencialmente” leo en otro lugar
de la carta. “Me bastaría ser inmortal para llevarla a
cabo.”
Jorge Luis Borges(1899–1986)
Pierre Menard, autor del Quijote (El jardín de senderos que
se bifurcan) Ficciones
Erice 2005, the Analysis of Patterns. Grammatical Inference
60
60
4 Algorithmic ideas
Erice 2005, the Analysis of Patterns. Grammatical Inference
61
61
The space of GI problems
• Type of input (strings)
• Presentation of input (batch)
• Hypothesis space (subset of
the regular grammars)
• Success
criteria
(identification in the limit)
Erice 2005, the Analysis of Patterns. Grammatical Inference
62
62
Types of input
Structural
Examples:
Strings:
the
cat
hates
the
dog
(+)
cat
dog
the
the
hates
(-)
Graphs:
Erice 2005, the Analysis of Patterns. Grammatical Inference
63
63
Types of input - oracles
• Membership queries
– Is string
language?
S
in
the
target
• Equivalence queries
– Is my hypothesis correct?
– If not, provide counter example
• Subset queries
– Is the language of my hypothesis
a subset of the target language?
Erice 2005, the Analysis of Patterns. Grammatical Inference
64
64
Presentation of input
• Arbitrary order
• Shortest to longest
• All
positive
and
negative
examples up to some length
• Sampled
according
to
some
probability distribution
Erice 2005, the Analysis of Patterns. Grammatical Inference
65
65
Presentation of input
• Text presentation
– A presentation of all strings in
the target language
• Complete
(informant)
presentation
– A presentation of all strings
over the alphabet of the target
language labeled as + or Erice 2005, the Analysis of Patterns. Grammatical Inference
66
66
Hypothesis space
• Regular grammars
– A welter of subclasses
• Context free grammars
– Fewer subclasses
• Hyper-edge
grammars
replacement
Erice 2005, the Analysis of Patterns. Grammatical Inference
graph
67
67
Success criteria
• Identification in the limit
– Text or informant presentation
– After
each
example,
learner
guesses language
– At some point, guess is correct
and never changes
• PAC learning
Erice 2005, the Analysis of Patterns. Grammatical Inference
68
68
Theorem’s due to Gold
• The good news
– Any recursively enumerable class of
languages can be learned in the limit
from an informant (Gold, 1967)
• The bad news
– A language class is superfinite if it
includes all finite languages and at
least one infinite language
– No superfinite class of languages can
be learned in the limit from a text
(Gold, 1967)
– That includes regular and contextfree
Erice 2005, the Analysis of Patterns. Grammatical Inference
69
69
A picture
DFA, from
queries
A lot of
information
Mildly context
sensitive, from
queries
DFA, from
pos+neg
Little
information
Sub-classes of
reg, from pos
Poor languages
Erice 2005, the Analysis of Patterns. Grammatical Inference
Context-free,
from pos
Rich Languages
70
70
Algorithms
RPNI
K-Reversible
L*
SEQUITUR
GRIDS
Erice 2005, the Analysis of Patterns. Grammatical Inference
71
71
4.1 RPNI
• Regular Positive and Negative
Grammatical Inference
Identifying regular
in polynomial time
languages
Jose Oncina & Pedro García 1992
Erice 2005, the Analysis of Patterns. Grammatical Inference
72
72
• It
is
a
state
algorithm;
• It
identifies
any
language in the limit;
• It works in polynomial
• It admits polynomial
teristic sets.
Erice 2005, the Analysis of Patterns. Grammatical Inference
merging
regular
time;
charac-
73
73
The algorithm
function rmerge(A,p,q)
A = merge(A,p,q)
while a, p,qA(r,a),
do
rmerge(A,p,q)
Erice 2005, the Analysis of Patterns. Grammatical Inference
pq
74
74
A=PTA(X); Fr ={(q0,a): a };
K ={q0};
While Fr do
choose q from Fr
if pK: L(rmerge(A,p,q))X-=
then A = rmerge(A,p,q)
else K = K  {q}
Fr = {(q,a): qK} – {K}
Erice 2005, the Analysis of Patterns. Grammatical Inference
75
75
X+={, aaa, aaba, ababa, bb, bbaaa}
a
a
2
a
4
b
b
a
7
a
8
9
b
12
5
1
11
b
3
b
a
10
a
a
14
a
15
13
6
X-={aa, ab, aaaa, ba}
Erice 2005, the Analysis of Patterns. Grammatical Inference
76
76
Try to merge 2 and 1
a
a
2
a
4
b
b
a
7
a
8
9
b
12
5
1
11
b
3
b
a
10
a
a
14
a
15
13
6
X-={aa, ab, aaaa, ba}
Erice 2005, the Analysis of Patterns. Grammatical Inference
77
77
Needs more merging for determinization
a
a
a
b
1,2
4
b
a
7
a
11
8
9
b
12
5
b
3
b
a
10
a
a
14
a
15
13
6
X-={aa, ab, aaaa, ba}
Erice 2005, the Analysis of Patterns. Grammatical Inference
78
78
But now string aaaa is
accepted, so the merge must be
rejected
a
9, 11
b
a
1,2,4,7
12
b
3,5,8
b
a
10
a
a
14
a
15
13
6
X-={aa, ab, aaaa, ba}
Erice 2005, the Analysis of Patterns. Grammatical Inference
79
79
Try to merge 3 and 1
a
a
2
a
4
b
b
a
7
a
8
9
b
12
5
1
11
b
3
b
a
10
a
a
14
a
15
13
6
X-={aa, ab, aaaa, ba}
Erice 2005, the Analysis of Patterns. Grammatical Inference
80
80
Requires to merge 6 with {1,3}
a
a
a
2
b
b
a
a
11
8
9
b
12
5
1,3
b
4
7
b
a
10
a
a
14
a
15
13
6
X-={aa, ab, aaaa, ba}
Erice 2005, the Analysis of Patterns. Grammatical Inference
81
81
And now to merge 2 with 10
a
a
a
2
4
b
b
a
1,3,6
7
a
8
9
b
12
5
b
11
a
10
a
a
14
a
15
13
X-={aa, ab, aaaa, ba}
Erice 2005, the Analysis of Patterns. Grammatical Inference
82
82
And now to merge 4 with 13
a
2,10
a
a
4
b
b
a
1,3,6
7
a
8
9
b
12
5
b
11
a
a
14
a
15
13
X-={aa, ab, aaaa, ba}
Erice 2005, the Analysis of Patterns. Grammatical Inference
83
83
And finally to merge 7 with 15
4,13
2,10
a
a
a
b
b
a
1,3,6
7
a
11
8
9
5
a
b
14
12
b
a
15
X-={aa, ab, aaaa, ba}
Erice 2005, the Analysis of Patterns. Grammatical Inference
84
84
No counter example is accepted
so the merges are kept
7,15
4,13
2,10
a
a
a
b
b
a
1,3,6
a
11
8
9
5
a
b
14
12
b
X-={aa, ab, aaaa, ba}
Erice 2005, the Analysis of Patterns. Grammatical Inference
85
85
Next possible merge to be checked
is {4,13} with {1,3,6}
7,15
4,13
2,10
a
a
a
b
b
a
1,3,6
a
11
8
9
5
a
b
14
12
b
X-={aa, ab, aaaa, ba}
Erice 2005, the Analysis of Patterns. Grammatical Inference
86
86
More merging for determinization
is needed
7,15
a
a
b
2,10
a
1,3,4,6,13
a
b
11
8
a
9
5
a
b
14
12
b
X-={aa, ab, aaaa, ba}
Erice 2005, the Analysis of Patterns. Grammatical Inference
87
87
But now aa is accepted
2,7,10,11,15
1,3,4,6,
8,13
a
a
b
a
9
5
a
b
14
12
b
X-={aa, ab, aaaa, ba}
Erice 2005, the Analysis of Patterns. Grammatical Inference
88
88
So we try {4,13} with {2,10}
7,15
4,13
2,10
a
a
a
b
b
a
1,3,6
a
11
8
9
5
a
b
14
12
b
X-={aa, ab, aaaa, ba}
Erice 2005, the Analysis of Patterns. Grammatical Inference
89
89
After determinizing,
negative string aa is again accepted
a
a
1,3,6
2,4,7,10, b
13,15
a
b
9,11
b
a
14
12
5,8
X-={aa, ab, aaaa, ba}
Erice 2005, the Analysis of Patterns. Grammatical Inference
90
90
So we try 5 with {1,3,6}
7,15
4,13
2,10
a
a
a
b
b
a
1,3,6
a
11
8
9
5
a
b
14
12
b
X-={aa, ab, aaaa, ba}
Erice 2005, the Analysis of Patterns. Grammatical Inference
91
91
But again we accept ab
7,15
4,13
2,9,10,14
a
1,3,5,6,12
a
a
b
a
11
8
b
b
X-={aa, ab, aaaa, ba}
Erice 2005, the Analysis of Patterns. Grammatical Inference
92
92
So we try 5 with {2,10}
7,15
4,13
2,10
a
a
a
b
b
a
1,3,6
a
11
8
9
5
a
b
14
12
b
X-={aa, ab, aaaa, ba}
Erice 2005, the Analysis of Patterns. Grammatical Inference
93
93
Which is OK. So next possible merge
is {7,15} with {1,3,6}
7,15
4,9,13
2,5,10
a
a
11,14
a
a
b
1,3,6
b
8,12
b
X-={aa, ab, aaaa, ba}
Erice 2005, the Analysis of Patterns. Grammatical Inference
94
94
Which is OK. Now try to merge
{8,12} with {1,3,6,7,15}
11,14
a
1,3,6,
7,15
4,9,13
b
a
8,12
a
a
a
2,5,10
b
b
X-={aa, ab, aaaa, ba}
Erice 2005, the Analysis of Patterns. Grammatical Inference
95
95
And ab is accepted
a
1,3,6,7,
8,12,15
4,9,13
b
a
a
b
2,5,10,11,14
b
X-={aa, ab, aaaa, ba}
Erice 2005, the Analysis of Patterns. Grammatical Inference
96
96
Now try to merge
{8,12} with {4,9,13}
11,14
a
1,3,6,
7,15
4,9,13
b
a
8,12
a
a
a
2,5,10
b
b
X-={aa, ab, aaaa, ba}
Erice 2005, the Analysis of Patterns. Grammatical Inference
97
97
This is OK and no more merge
is possible so the algorithm halts.
a
1,3,6,7,
11,14,15
4,8,9,12,13
b
a
a
a
2,5,10
b
b
X-={aa, ab, aaaa, ba}
Erice 2005, the Analysis of Patterns. Grammatical Inference
98
98
Definitions
• Let

be
the
length-lex
ordering over *
• Let Pref(L) be the set of all
prefixes of strings in some
language L.
Erice 2005, the Analysis of Patterns. Grammatical Inference
99
99
Short prefixes
Sp(L)={uPref(L):
(q0,u)=(q0,v)  uv}
• There is one short prefix per
useful state
b
0
Sp(L)={, a}
a
a
1
b
Erice 2005, the Analysis of Patterns. Grammatical Inference
2
b
a
100
10
Kernel-sets
• N(L)={uaPref(L): uSp(L)}{}
• There is an element in the
Kernel-set
for
each
useful
transition
b
0
N(L)={, a, b, ab}
a
a
1
b
Erice 2005, the Analysis of Patterns. Grammatical Inference
2
b
a
101
10
A characteristic sample
•A
sample
is
(for RPNI) if
characteristic
– xSp(L) xuX+
– xSp(L), yN(L),
(q0,x)(q0,y)
z*:
xzX+yzX- 
xzX-yzX+
Erice 2005, the Analysis of Patterns. Grammatical Inference
102
10
About characteristic samples
• If you add more strings to a
characteristic sample it still is
characteristic;
• There
can
be
many
different
characteristic samples;
• Change
the
ordering
(or
the
exploring function in RPNI) and
the characteristic sample will
change.
Erice 2005, the Analysis of Patterns. Grammatical Inference
103
10
Conclusion
• RPNI
identifies
any
regular
language in the limit;
• RPNI works in polynomial time.
Complexity is in O(║X+║3.║X-║);
• There
are
many
significant
variants of RPNI;
• RPNI can be extended to other
classes of grammars.
Erice 2005, the Analysis of Patterns. Grammatical Inference
104
10
Open problems
• RPNI’s complexity is not a
tight upper bound. Find the
correct complexity.
• The
definition
of
the
characteristic
set
is
not
tight either. Find a better
definition.
Erice 2005, the Analysis of Patterns. Grammatical Inference
105
10
Algorithms
RPNI
K-Reversible
L*
SEQUITUR
GRIDS
Erice 2005, the Analysis of Patterns. Grammatical Inference
106
10
4.2 The k-reversible languages
• The class was proposed by Angluin
(1982).
• The class is identifiable in the
limit from text.
• The class is composed by regular
languages that can be accepted by
a DFA such that its reverse is
deterministic
of k.
with
Erice 2005, the Analysis of Patterns. Grammatical Inference
a
look-ahead
107
10
Let A=(, Q, , I, F) be a NFA,
we denote by AT=(, Q, T, F,
I) the reversal automaton
with:
T(q,a)={q’Q: q(q’,a)}
Erice 2005, the Analysis of Patterns. Grammatical Inference
108
10
a
A
0
a
1
a
2
a
3
4
a
a
1
a
b
a
AT
0
b
2
a
3
Erice 2005, the Analysis of Patterns. Grammatical Inference
a
b
b
4
109
10
Some definitions
• u is a k-successor of q if
│u│=k and (q,u).
• u is a k-predecessor of q if
│u│=k and T(q,uT).
•
is
0-successor
and
0predecessor of any state.
Erice 2005, the Analysis of Patterns. Grammatical Inference
110
11
a
A
0
a
1
a
2
a
3
a
b
b
4
• aa is a 2-successor of 0 and
1 but not of 3.
• a is a 1-successor of 3.
• aa is a 2-predecessor of 3
but not of 1.
Erice 2005, the Analysis of Patterns. Grammatical Inference
111
11
A NFA is deterministic with
look-ahead k iff q,q’Q:
qq’
(q,q’I)  (q,q’(q”,a))

(u is a k-successor of q) 
(v is a k-successor of q’)
Erice 2005, the Analysis of Patterns. Grammatical Inference

uv
112
11
Prohibited:
u
1
│u│=k
a
a
u
2
Erice 2005, the Analysis of Patterns. Grammatical Inference
113
11
Example
a
a
0
1
a
2
a
3
a
b
b
4
This
automaton
is
not
deterministic with look-ahead
1 but is deterministic with
look-ahead 2.
Erice 2005, the Analysis of Patterns. Grammatical Inference
114
11
K-reversible automata
• A is k-reversible if A is
deterministic
and
AT
is
deterministic with look-ahead k.
• Example
0
a
b
b
a
1
b
deterministic
b
2
0
a
a
1
b
2
b
deterministic with look-ahead 1
Erice 2005, the Analysis of Patterns. Grammatical Inference
115
11
Violation of k-reversibility
• Two states q, q’ violate the
k-reversibility condition iff
– they violate the deterministic
condition: q,q’(q”,a);
or
– they
violate
the
look-ahead
condition:
• q,q’F, uk: u is k-predecessor of
both;
• uk, (q,a)=(q’,a) and u is kpredecessor of both q and q’.
Erice 2005, the Analysis of Patterns. Grammatical Inference
116
11
Learning k-reversible automata
• Key idea: the order in which
the merges are performed does
not matter!
• Just merge states that do not
comply with the conditions
for k-reversibility.
Erice 2005, the Analysis of Patterns. Grammatical Inference
117
11
K-RL Algorithm (k-RL)
Data: k, X sample of a k-RL L
A=PTA(X)
While
q,q’
k-reversibility
violators do
A=merge(A,q,q’)
Erice 2005, the Analysis of Patterns. Grammatical Inference
118
11
k=2
Let X={a, aa, abba, abbbba}
a
a

aa
abba
a
a
b
ab
b
abb
b
abbb
b
abbbb
a
abbbba
Violators, for u= ba
Erice 2005, the Analysis of Patterns. Grammatical Inference
119
11
Let X={a, aa, abba, abbbba}
a
a

aa
abba
a
a
b
ab
k=2
b
abb
a
b
abbb
b
abbbb
Violators, for u= bb
Erice 2005, the Analysis of Patterns. Grammatical Inference
120
12
Let X={a, aa, abba, abbbba}
a
a

aa
abba
a
a
b
ab
b
abb
k=2
b
b
Erice 2005, the Analysis of Patterns. Grammatical Inference
abbb
121
12
Properties (1)
• k0, X, k-RL(X) is a kreversible language.
• L(k-RL(X)) is the smallest kreversible
language
that
contains X.
• The class Lk-RL is identifiable
in the limit from text.
Erice 2005, the Analysis of Patterns. Grammatical Inference
122
12
Properties (2)
• Any regular language is kreversible iff
(u1v)-1L (u2v)-1L and │v│=k

(u1v)-1L=(u2v)-1L
(if two strings are prefixes of a string
of length at least k, then the strings
are Nerode-equivalent)
Erice 2005, the Analysis of Patterns. Grammatical Inference
123
12
Properties (3)
• Lk-RL(X)  L(k+1)-RL(X)
• Lk-TSS(X)  L(k-1)-RL(X)
Erice 2005, the Analysis of Patterns. Grammatical Inference
124
12
Properties (4)
The time complexity is O(k║X║3).
The space complexity is O(║X║).
The algorithm is not incremental.
Erice 2005, the Analysis of Patterns. Grammatical Inference
125
12
Properties (4)
Polynomial aspects
• Polynomial characteristic sets
• Polynomial update time
• But
not
necessarily
a
polynomial
number
of
mind
changes
Erice 2005, the Analysis of Patterns. Grammatical Inference
126
12
Extensions
• Sakakibara built an extension for
context-free grammars whose tree
language is k-reversible
• Marion
&
Besombes
propose
an
extension to tree languages.
• Different authors propose to learn
these automata and then estimate
the probabilities as an alternative
to learning stochastic automata.
Erice 2005, the Analysis of Patterns. Grammatical Inference
127
12
Exercises
• Construct a language L that
is not k-reversible, k0.
• Prove that the class of kreversible languages is not
in TxtEx.
• Run k-RL on X={aa, aba, abb,
abaaba, baaba} for k=0,1,2,3
Erice 2005, the Analysis of Patterns. Grammatical Inference
128
12
Solution (idea)
• Lk={ai: ik}
• Then for each k: Lk is kreversible
but
not
k-1reversible.
• And
ULk
= a*
• So there
point…
is
an
Erice 2005, the Analysis of Patterns. Grammatical Inference
accumulation
129
12
Algorithms
RPNI
K-Reversible
L*
SEQUITUR
GRIDS
Erice 2005, the Analysis of Patterns. Grammatical Inference
130
13
4.4 Active Learning:
learning DFA from membership and
equivalence queries: the L* algorithm
Erice 2005, the Analysis of Patterns. Grammatical Inference
131
13
The classes C and H
• sets of examples
• representations of these sets
• the computation of L(x) (and
h(x)) must take place in time
polynomial in x.
Erice 2005, the Analysis of Patterns. Grammatical Inference
132
13
Correct learning
A class C is identifiable with a
polynomial number of queries of
type
T if there exists an
algorithm  that:
1) LC identifies L with a polynomial
number of queries of type T;
2) does each update in time polynomial
in f and in xi, {xi} counterexamples seen so far.
Erice 2005, the Analysis of Patterns. Grammatical Inference
133
13
Algorithm L*
• Angluin’s papers
• Some talks by Rivest
• Kearns and Vazirani
• Balcazar,
Diaz,
Gavaldà
Watanabe
Erice 2005, the Analysis of Patterns. Grammatical Inference
&
134
13
Some references
• Learning regular sets from queries
and counter-examples, D. Angluin,
Information and computation, 75,
87-106, 1987.
• Queries and Concept learning, D.
Angluin, Machine Learning, 2, 319342, 1988.
• Negative results for Equivalence
Queries,
D.
Angluin,
Machine
Learning, 5, 121-150, 1990.
Erice 2005, the Analysis of Patterns. Grammatical Inference
135
13
The Minimal Adequate Teacher
• You are allowed:
– strong equivalence queries;
– membership queries.
Erice 2005, the Analysis of Patterns. Grammatical Inference
136
13
General idea of L*
• find
a
consistent
table
(representing a DFA);
• submit it as an equivalence query;
• use counterexample to update the
table;
• submit membership queries to make
the table complete;
• Iterate.
Erice 2005, the Analysis of Patterns. Grammatical Inference
137
13
An observation table
 a

a
1
0
0
0
b
aa
ab
1
0
1
0
0
0
Erice 2005, the Analysis of Patterns. Grammatical Inference
138
13
The experiments (E)
 a

a
1
0
0
0
b
aa
ab
1
0
1
0
0
0
Erice 2005, the Analysis of Patterns. Grammatical Inference
The states (S)
or test set
The transitions (T)
139
13
Meaning
 a

a
1
0
0
0
b
aa
ab
1
0
1
0
0
0
Erice 2005, the Analysis of Patterns. Grammatical Inference
(q0, . )F

 L
140
14
 a

1
0
0
0
b 1
aa 0
ab 1
0
0
0
a
Erice 2005, the Analysis of Patterns. Grammatical Inference
(q0, ab.a) F

aba  L
141
14
Equivalent prefixes
 a

1
0
0
0
b 1
aa 0
ab 1
0
0
0
a
Erice 2005, the Analysis of Patterns. Grammatical Inference
These two rows
are equal,
hence
(q0,)= (q0,ab)
142
14
Building a DFA from a table
 a

1
0
0
0
b 1
aa 0
ab 1
0
0
0
a
Erice 2005, the Analysis of Patterns. Grammatical Inference

a
a
143
14
b
 a

1
0
0
0
b 1
aa 0
ab 1
0
0
0
a
Erice 2005, the Analysis of Patterns. Grammatical Inference

b
a
a
a
144
14
Some rules
This set is suffix-closed
b
 a
This
set is
prefixclosed
S\S=T

a
1
0
0
0
b
aa
ab
1
0
1
0
0
0
Erice 2005, the Analysis of Patterns. Grammatical Inference

b
a
a
a
145
14
An incomplete table
b
 a

a
b
aa
ab
1
0
0
1
0
0
0
1

b
a
a
Erice 2005, the Analysis of Patterns. Grammatical Inference
a
146
14
Good idea
We can complete the table
making membership queries...
v
u
by
Membership query:
?
Erice 2005, the Analysis of Patterns. Grammatical Inference
uvL ?
147
14
A table is
closed
if
any
row
of
T
corresponds to some row in S
 a

a
1
0
0
0
b
aa
ab
1
0
1
0
1
0
Erice 2005, the Analysis of Patterns. Grammatical Inference
Not closed
148
14
And a table that is not closed
b
 a

a
1
0
0
0
b
aa
ab
1
0
1
0
1
0

b
a
a
a
?
Erice 2005, the Analysis of Patterns. Grammatical Inference
149
14
What do we do when we have a table that
is not closed?
• Let s be the row (of T) that
does not appear in S.
• Add s to S, and a sa to T.
Erice 2005, the Analysis of Patterns. Grammatical Inference
150
15
An inconsistent table
 a

a
b
aa
ab
ba
bb
1
0
0
1
1
1
0
0
0
0
0
0
0
0
Erice 2005, the Analysis of Patterns. Grammatical Inference
Are a and b
equivalent?
151
15
A table is consistent if
Every equivalent pair of rows
in H remains equivalent in S
after appending any symbol
row(s1)=row(s2)

a, row(s1a)=row(s2a)
Erice 2005, the Analysis of Patterns. Grammatical Inference
152
15
What do we do when we have an
inconsistent table?
Let
a
be
row(s1)=row(s2)
row(s1a)row(s2a)
such
that
but
• If row(s1a)row(s2a), it is so
for experiment e
• Then add experiment ae to the
table
Erice 2005, the Analysis of Patterns. Grammatical Inference
153
15
What do we do when we have a closed and
consistent table ?
• We build the corresponding DFA
• We make an equivalence query!!!
Erice 2005, the Analysis of Patterns. Grammatical Inference
154
15
What do we do if we get a counterexample?
• Let u be this counter-example
• wPref(u) do
– add w to S
– a, such that waPref(u) add
wa to T
Erice 2005, the Analysis of Patterns. Grammatical Inference
155
15
Run of the algorithm


1
a
1
b
1
b
Table is now
closed
and consistent

a
Erice 2005, the Analysis of Patterns. Grammatical Inference
156
15
An equivalence query is made!
b

a
Counter example baa is returned
Erice 2005, the Analysis of Patterns. Grammatical Inference
157
15


b
ba
baa
1
1
1
0
a
bb
bab
baaa
baab
1
1
1
0
1
Erice 2005, the Analysis of Patterns. Grammatical Inference
Not
consistent
Because of
158
15

a
b
ba
baa
1
1
1
0
1
1
0
0
a
bb
bab
baaa
baab
1
1
1
0
1
0
1
1
0
0

Table is now
closed
and
consistent
b

a ba
b
Erice 2005, the Analysis of Patterns. Grammatical Inference
b
a
a
baa
159
15
Proof of the algorithm
Sketch only
Understanding the proof is important
for further algorithms
Balcazar et al. is a good place for that.
Erice 2005, the Analysis of Patterns. Grammatical Inference
160
16
Termination / Correctness
• For every regular language there is
a unique minimal DFA that recognizes
it.
• Given a closed and consistent table,
one can generate a consistent DFA.
• A DFA consistent with a table has at
least as many states as different
rows in S.
• If the algorithm has built a table
with n different rows in S, then it
is the target.
Erice 2005, the Analysis of Patterns. Grammatical Inference
161
16
Finiteness
• Each closure failure adds one
different row to S.
• Each inconsistency failure adds
one
experiment,
which
also
creates a new row in S.
• Each counterexample adds one
different row to S.
Erice 2005, the Analysis of Patterns. Grammatical Inference
162
16
Polynomial
• |E|  n
• at most n-1 equivalence queries
• |membership queries|  n(n-1)m
where m is the length of the
longest counter-example returned
by the oracle
Erice 2005, the Analysis of Patterns. Grammatical Inference
163
16
Conclusion
• With an MAT you can learn DFA
– but also a variety of other classes of
grammars;
– it is difficult to see how powerful is really
an MAT;
– probably as much as PAC learning.
– Easy to find a class, a set of queries and
provide and algorithm that learns with them;
– more difficult for it to be meaningful.
• Discussion:
meaningful?
why
are
Erice 2005, the Analysis of Patterns. Grammatical Inference
these
queries
164
16
Algorithms
RPNI
K-Reversible
L*
SEQUITUR
GRIDS
Erice 2005, the Analysis of Patterns. Grammatical Inference
165
16
4.5 SEQUITUR
(http://sequence.rutgers.edu/sequitur/)
(Neville Manning & Witten, 97)
Idea: construct a CF grammar
from a very long string w,
such that L(G)={w}
– No generalization
– Linear time (+/-)
– Good compression rates
Erice 2005, the Analysis of Patterns. Grammatical Inference
166
16
Principle
The grammar with respect to the
string:
– Each rule has to be used at
least twice;
– There can be no sub-string of
length 2 that appears twice.
Erice 2005, the Analysis of Patterns. Grammatical Inference
167
16
Examples
Sabcdbc
S aAdA
A bc
SAaA
A aab
Saabaaab
Erice 2005, the Analysis of Patterns. Grammatical Inference
SAbAab
A aa
168
16
abcabdabcabd
Erice 2005, the Analysis of Patterns. Grammatical Inference
169
16
In the beginning, God created the heavens and the
earth.
And the earth was without form, and void; and
darkness was upon the face of the deep. And the
Spirit of God moved upon the face of the waters.
And God said, Let there be light: and there was
light.
And God saw the light, that it was good: and God
divided the light from the darkness.
And God called the light Day, and the darkness he
called Night. And the evening and the morning
were the first day.
And God said, Let there be a firmament in the
midst of the waters, and let it divide the
waters from the waters.
And God made the firmament, and divided the waters
which were under the firmament from the waters
which were above the firmament: and it was so.
And God called the firmament Heaven. And the
evening and the morning were the second day.
Erice 2005, the Analysis of Patterns. Grammatical Inference
170
17
Erice 2005, the Analysis of Patterns. Grammatical Inference
171
17
Sequitur options
• appending a symbol to rule S;
• using an existing rule;
• creating a new rule;
• and deleting a rule.
Erice 2005, the Analysis of Patterns. Grammatical Inference
172
17
Results
On text:
– 2.82 bpc
– compress 3.46 bpc
– gzip 3.25 bpc
– PPMC 2.52 bpc
Erice 2005, the Analysis of Patterns. Grammatical Inference
173
17
Algorithms
RPNI
K-Reversible
L*
SEQUITUR
GRIDS
Erice 2005, the Analysis of Patterns. Grammatical Inference
174
17
4.6 Using a simplicity bias
(Langley & Stromsten, 00)
Based on algorithm GRIDS
(Wolff, 82)
Main characteristics:
– MDL principle;
– Not characterizable;
– Not tested on large benchmarks.
Erice 2005, the Analysis of Patterns. Grammatical Inference
175
17
Two learning operators
Creation of non terminals and rules
NP ART ADJ NOUN
NP ART ADJ ADJ NOUN
NP ART AP1
NP ART ADJ AP1
AP1  ADJ NOUN
Erice 2005, the Analysis of Patterns. Grammatical Inference
176
17
Merging two non terminals
NP ART AP1
NP ART AP2
AP1  ADJ NOUN
AP2  ADJ AP1
NP ART AP1
AP1  ADJ NOUN
AP1  ADJ AP1
Erice 2005, the Analysis of Patterns. Grammatical Inference
177
17
• Scoring
function:
principle: G+wT d(w)
MDL
• Algorithm:
– find best merge that improves
current grammar
– if no such merge exists, find
best creation
– halt when no improvement
Erice 2005, the Analysis of Patterns. Grammatical Inference
178
17
Results
• On
subsets
of
English
grammars (15 rules, 8 non
terminals, 9 terminals): 120
sentences to converge
• on (ab)*: all (15) strings of
length  30
• on Dyck1: all (65) strings of
length  12
Erice 2005, the Analysis of Patterns. Grammatical Inference
179
17
Algorithms
RPNI
K-Reversible
L*
SEQUITUR
GRIDS
Erice 2005, the Analysis of Patterns. Grammatical Inference
180
18
5 Open questions and conclusions
• dealing with noise
• classes
of
languages
that
adequately mix Chomsky’s hierarchy
with edit distance compacity
• stochastic context-free grammars
• polynomial learning from text
• learning POMDPs
• fast algorithms
Erice 2005, the Analysis of Patterns. Grammatical Inference
181
18
Intuí que había caído en una trampa y quise huir. Hice un enorme esfuerzo,
pero era tarde: mi cuerpo ya no me obedecía. Me resigné a presenciar lo que
iba a pasar, como si fuera un acontecimiento ajeno a mi persona. El hombre
aquel comenzó a transformarme en pájaro, en un pájaro de tamaño humano.
Empezó por los pies: vi cómo se convenían poco a poco en unas patas de gallo
o algo así. Después siguió la transformación de todo el cuerpo, hacia arriba,
como sube el agua en un estanque. Mi única esperanza estaba ahora en los
amigos, que inexplicablemente no habían llegado. Cuando por fin llegaron,
sucedió algo que me horrorizó: no notaron mi transformación. Me trataron como
siempre, lo que probaba que me veían como siempre. Pensando que el mago
los ilusionaba de modo que me vieran como una persona normal, decidí referir
lo que me había hecho. Aunque mi propósito era referir el fenómeno con
tranquilidad, para no agravar la situación irritando al mago con una reacción
demasiado violenta (lo que podría inducirlo a hacer algo todavía peor), comencé
a contar todo a gritos. Entonces observé dos hechos asombrosos: la frase que
quería pronunciar salió convertida en un áspero chillido de pájaro, un chillido
desesperado y extraño, quizá por lo que encerraba de humano; y, lo que era
infinitamente peor, mis amigos no oyeron ese chillido, como no habían visto mi
cuerpo de gran pájaro; por el contrario, parecían oír mi voz habitual diciendo
cosas habituales, porque en ningún momento mostraron el menor asombro. Me
callé, espantado. El dueño de casa me miró entonces con un sarcástico brillo en
sus ojos, casi imperceptible y en todo caso sólo advertido por mí. Entonces
comprendí que nadie, nunca, sabría que yo había sido transformado en pájaro.
Estaba perdido para siempre y el secreto iría conmigo a la tumba.
ERNESTO SÁBATO, EL TÚNEL
Erice 2005, the Analysis of Patterns. Grammatical Inference
182
18
Download