FSM Toolkit

advertisement
Introduction to
FSM Toolkit
Examples: Part I
NLP Course 07
Example 1
Acceptor for “sheeptalk”: /baa+!/
Text Representation Symbols File
(sheep.txt)
(S.syms)
01b
eps 0
12a
a1
23a
b2
33a
!3
34!
w4
4
o5
u6
f7
-Symbols w, o, u and f are needed for the 2nd example.
-eps symbol stands for possible future epsilon transitions.

Example 1



fsmcompile –i S.syms sheep.txt > sheep.fsa
fsmdraw –i S.syms sheep.fsa | dot –Tps > sheep.ps
Image format: PostScript. For jpg write:
fsmdraw –i S.syms sheep.fsa | dot –Tjpg > sheep.jpg
Write an acceptor for “dogtalk”: /wouf!/
Example 2
Acceptor for “dogtalk”: /wouf!/
Text Representation Symbols File
(dog.txt)
(S.syms)  same as Ex.1
01w
eps 0
(sheep & dog share
12o
a1
the same symbols file)
23u
b2
34f
!3
45!
w4
5
o5
u6
f7

Example 2


fsmcompile –i S.syms dog.txt > dog.fsa
fsmdraw –i S.syms dog.fsa | dot –Tps > dog.ps
Having the 2 fsa for “sheeptalk” and
“dogtalk”, use the appropriate function
to generate an acceptor that accepts a
“sheeptalk” OR a “dogtalk”.
Example 3


fsmunion sheep.fsa dog.fsa > shORdg.fsa
fsmdraw –iS.syms < shORdg.fsa | dot –Tps > shORdg.ps
Having the 2 fsa for “sheeptalk” and
“dogtalk”, use the appropriate function
to generate an acceptor that accepts a
“sheeptalk” AND a “dogtalk”, using the
constraint that sheep talks first!
Example 4


fsmconcat sheep.fsa dog.fsa > shANDdg.fsa
fsmdraw –iS.syms < shANDdg.fsa | dot –Tps > shANDdg.ps
But the Society of Animals is always
fair! This time let the dog to speak
first…!!!
?
Example 5

Generate the following weighted FSM:
Example 5
Text Representation Symbols File
(A.txt)
(S2.syms)
0 1 red 0.3
eps 0
1 3 blue 0.7
red 1
0 2 green 0.4
blue 2
2 3 yellow 0.8
green 3
3 0.3
yellow 4
4 0.4
As before: fsmcompile, fsmdraw
Which is the path with the lowest cost?
Example 5


fsmbestpath A.fsa > B.fsa
fsmdraw –iS2.syms < B.fsa | dot –Tps > B.ps
Integrating the power of Perl with the
FSM Toolkit
Perl & FSM Toolkit
Problem Definition:
We have as input a file containing a single sentence of lower case
words.
“ hi nlp world”

Goal: transform the above words into upper case using FSM.
“ HI NLP WORLD”
Perl & FSM Toolkit

1.
2.
3.
4.
5.
6.
A Perl script (composition.pl) that:
Extracts the lower case words from the input file
Generates the corresponding transducer
Generates a second transducer that transforms each
word to its’ upper case form
Compose the two transducers
Projects the output of the resulted transducer
Extracts the output of the above transducer by reading
the appropriate file and prints the upper case
sentence to the screen
#!/usr/bin/perl
open (IN, $ARGV[0]) || die “error";
$rdln = <IN>;
@in_wrds = split(/\s+/,$rdln); close(IN);
# write the files for the transducers
open (OUT_T11, ">T11") || die "error";
open (OUT_T12, ">T12") || die “error";
@low_up_words=@in_wrds;
$c=0;
foreach $tmp (@in_wrds)
{
print OUT_T11 ($c,"\t",$c+1,"\t",$tmp,"\t",$tmp,"\n");
print OUT_T12 ($c,"\t",$c+1,"\t",$tmp,"\t",uc($tmp),"\n");
push (@low_up_words,uc($tmp)); #gather lower and upper case words
$c++;
}
print OUT_T11 ($c,"\n");
print OUT_T12 ($c,"\n");
close(OUT_T1);
close(OUT_T2);
# write symbols file
$i=1;
open (OUT_S12, ">S12") || die “error";
foreach $tmp (@low_up_words)
{
print OUT_S12 ($tmp,"\t",$i,"\n");
$i++;
}
close(OUT_S12);
# call the FSM Library
system ("fsmcompile -iS12 -oS12 -t < T11 > T11.fst");
system ("fsmdraw -iS12 -oS12 < T11.fst | dot -Tps > T11.ps");
system ("fsmcompile -iS12 -oS12 -t < T12 > T12.fst");
system ("fsmdraw -iS12 -oS12 < T12.fst | dot -Tps > T12.ps");
system ("fsmcompose T11.fst T12.fst > T12comp.fst");
system ("fsmdraw -iS12 -oS12 < T12comp.fst | dot -Tps > T12comp.ps");
system ("fsmproject -2 T12comp.fst > final_out.fsa ");
system ("fsmdraw -iS12 < final_out.fsa | dot -Tps > final_out.ps");
system ("fsmprint -iS12 < final_out.fsa > final_out");
# Finally, read the resulted file and extract the field of interest
open (IN2, "final_out") || die "can not open the input file...\n";
$rdln2 = <IN2>;
while ($rdln2 ne "")
{
@out_wrds = split(/\s+/,$rdln2);
push (@up_wrds,$out_wrds[2]);
$rdln2 = <IN2>;
}
close(IN2);
# print the upper case content of the initial input file
print (join(" ",@up_wrds),"\n");
Perl & FSM Toolkit
First fst (T11.fst)
0 1
hi
hi
1 2
nlp
nlp
2 3
world world
3
Symbols File (S12)
hi 1
nlp
2
world 3
HI 4
NLP
5
WORLD 6
Second fst (T12.fst)
0 1 hi HI
1 2 nlp NLP
2 3 world WORLD
3
Perl & FSM Toolkit
Compose T11.fst and T12.fst
system ("fsmcompose T11.fst T12.fst > T12comp.fst");
system ("fsmdraw -iS12 -oS12 < T12comp.fst | dot -Tps > T12comp.ps");

Perl & FSM Toolkit
Project the output of the resulted transducer:
system ("fsmproject -2 T12comp.fst > final_out.fsa ");
Draw the final_out.fsa:
system ("fsmdraw -iS12 < final_out.fsa | dot -Tps > final_out.ps");
Print a textual description of the above fsa:
system ("fsmprint -iS12 < final_out.fsa > final_out");
Read the textual this textual description using Perl:
open (IN2, "final_out") || die "can not open the input file...\n";
$rdln2 = <IN2>;
...
Perl & FSM Toolkit
Textual description of final_out.fsa:
0 1 HI
1 2 NLP
2 3 WORLD
3
Simple extra exercises
Extras 1

Generate the following acceptor, determinize and minimize it
Extras 2

Generate the following transducers and find their composition
Download