Introduction to FSM Toolkit Examples: Part I NLP Course 07 Example 1 Acceptor for “sheeptalk”: /baa+!/ Text Representation Symbols File (sheep.txt) (S.syms) 01b eps 0 12a a1 23a b2 33a !3 34! w4 4 o5 u6 f7 -Symbols w, o, u and f are needed for the 2nd example. -eps symbol stands for possible future epsilon transitions. Example 1 fsmcompile –i S.syms sheep.txt > sheep.fsa fsmdraw –i S.syms sheep.fsa | dot –Tps > sheep.ps Image format: PostScript. For jpg write: fsmdraw –i S.syms sheep.fsa | dot –Tjpg > sheep.jpg Write an acceptor for “dogtalk”: /wouf!/ Example 2 Acceptor for “dogtalk”: /wouf!/ Text Representation Symbols File (dog.txt) (S.syms) same as Ex.1 01w eps 0 (sheep & dog share 12o a1 the same symbols file) 23u b2 34f !3 45! w4 5 o5 u6 f7 Example 2 fsmcompile –i S.syms dog.txt > dog.fsa fsmdraw –i S.syms dog.fsa | dot –Tps > dog.ps Having the 2 fsa for “sheeptalk” and “dogtalk”, use the appropriate function to generate an acceptor that accepts a “sheeptalk” OR a “dogtalk”. Example 3 fsmunion sheep.fsa dog.fsa > shORdg.fsa fsmdraw –iS.syms < shORdg.fsa | dot –Tps > shORdg.ps Having the 2 fsa for “sheeptalk” and “dogtalk”, use the appropriate function to generate an acceptor that accepts a “sheeptalk” AND a “dogtalk”, using the constraint that sheep talks first! Example 4 fsmconcat sheep.fsa dog.fsa > shANDdg.fsa fsmdraw –iS.syms < shANDdg.fsa | dot –Tps > shANDdg.ps But the Society of Animals is always fair! This time let the dog to speak first…!!! ? Example 5 Generate the following weighted FSM: Example 5 Text Representation Symbols File (A.txt) (S2.syms) 0 1 red 0.3 eps 0 1 3 blue 0.7 red 1 0 2 green 0.4 blue 2 2 3 yellow 0.8 green 3 3 0.3 yellow 4 4 0.4 As before: fsmcompile, fsmdraw Which is the path with the lowest cost? Example 5 fsmbestpath A.fsa > B.fsa fsmdraw –iS2.syms < B.fsa | dot –Tps > B.ps Integrating the power of Perl with the FSM Toolkit Perl & FSM Toolkit Problem Definition: We have as input a file containing a single sentence of lower case words. “ hi nlp world” Goal: transform the above words into upper case using FSM. “ HI NLP WORLD” Perl & FSM Toolkit 1. 2. 3. 4. 5. 6. A Perl script (composition.pl) that: Extracts the lower case words from the input file Generates the corresponding transducer Generates a second transducer that transforms each word to its’ upper case form Compose the two transducers Projects the output of the resulted transducer Extracts the output of the above transducer by reading the appropriate file and prints the upper case sentence to the screen #!/usr/bin/perl open (IN, $ARGV[0]) || die “error"; $rdln = <IN>; @in_wrds = split(/\s+/,$rdln); close(IN); # write the files for the transducers open (OUT_T11, ">T11") || die "error"; open (OUT_T12, ">T12") || die “error"; @low_up_words=@in_wrds; $c=0; foreach $tmp (@in_wrds) { print OUT_T11 ($c,"\t",$c+1,"\t",$tmp,"\t",$tmp,"\n"); print OUT_T12 ($c,"\t",$c+1,"\t",$tmp,"\t",uc($tmp),"\n"); push (@low_up_words,uc($tmp)); #gather lower and upper case words $c++; } print OUT_T11 ($c,"\n"); print OUT_T12 ($c,"\n"); close(OUT_T1); close(OUT_T2); # write symbols file $i=1; open (OUT_S12, ">S12") || die “error"; foreach $tmp (@low_up_words) { print OUT_S12 ($tmp,"\t",$i,"\n"); $i++; } close(OUT_S12); # call the FSM Library system ("fsmcompile -iS12 -oS12 -t < T11 > T11.fst"); system ("fsmdraw -iS12 -oS12 < T11.fst | dot -Tps > T11.ps"); system ("fsmcompile -iS12 -oS12 -t < T12 > T12.fst"); system ("fsmdraw -iS12 -oS12 < T12.fst | dot -Tps > T12.ps"); system ("fsmcompose T11.fst T12.fst > T12comp.fst"); system ("fsmdraw -iS12 -oS12 < T12comp.fst | dot -Tps > T12comp.ps"); system ("fsmproject -2 T12comp.fst > final_out.fsa "); system ("fsmdraw -iS12 < final_out.fsa | dot -Tps > final_out.ps"); system ("fsmprint -iS12 < final_out.fsa > final_out"); # Finally, read the resulted file and extract the field of interest open (IN2, "final_out") || die "can not open the input file...\n"; $rdln2 = <IN2>; while ($rdln2 ne "") { @out_wrds = split(/\s+/,$rdln2); push (@up_wrds,$out_wrds[2]); $rdln2 = <IN2>; } close(IN2); # print the upper case content of the initial input file print (join(" ",@up_wrds),"\n"); Perl & FSM Toolkit First fst (T11.fst) 0 1 hi hi 1 2 nlp nlp 2 3 world world 3 Symbols File (S12) hi 1 nlp 2 world 3 HI 4 NLP 5 WORLD 6 Second fst (T12.fst) 0 1 hi HI 1 2 nlp NLP 2 3 world WORLD 3 Perl & FSM Toolkit Compose T11.fst and T12.fst system ("fsmcompose T11.fst T12.fst > T12comp.fst"); system ("fsmdraw -iS12 -oS12 < T12comp.fst | dot -Tps > T12comp.ps"); Perl & FSM Toolkit Project the output of the resulted transducer: system ("fsmproject -2 T12comp.fst > final_out.fsa "); Draw the final_out.fsa: system ("fsmdraw -iS12 < final_out.fsa | dot -Tps > final_out.ps"); Print a textual description of the above fsa: system ("fsmprint -iS12 < final_out.fsa > final_out"); Read the textual this textual description using Perl: open (IN2, "final_out") || die "can not open the input file...\n"; $rdln2 = <IN2>; ... Perl & FSM Toolkit Textual description of final_out.fsa: 0 1 HI 1 2 NLP 2 3 WORLD 3 Simple extra exercises Extras 1 Generate the following acceptor, determinize and minimize it Extras 2 Generate the following transducers and find their composition