Transformational grammars

advertisement
A shorted version from:
Anastasia Berdnikova
&
Denis Miretskiy




‘Colourless green ideas sleep furiously’.
Chomsky constructed finite formal machines
– ‘grammars’.
‘Does the language contain this sentence?’
(intractable)  ‘Can the grammar create this
sentence?’ (can be answered).
TG are sometimes called generative
grammars.
Transformational grammars
2







TG = ( {symbols}, {rewriting rules α→β - productions} )
{symbols} = {nonterminal} U {terminal}
α contains at least one nonterminal, β – terminals and/or
nonterminals.
S → aS, S → bS, S → e (S → aS | bS | e)
Derivation: S=>aS=>abS=>abbS=>abb.
Parse tree: root – start nonterminal S, leaves – the
terminal symbols in the sequence, internal nodes are
nonterminals.
The children of an internal node are the productions of
it.
Transformational grammars
3





W – nonterminal, a – terminal, α and γ –
strings of nonterminals and/or terminals
including the null string, β – the same not
including the null string.
regular grammars: W → aW or W → a
context-free grammars: W → β
context-sensitive grammars: α1Wα2 →
α1βα2. AB → BA
unrestricted (phase structure) grammars:
α1Wα2 → γ
Transformational grammars
4
Transformational grammars
5



Each grammar has a corresponding abstract
computational device – automaton.
Grammars: generative models, automata:
parsers that accept or reject a given
sequence.
- automata are often more easy to describe and
understand than their equivalent grammars.
- automata give a more concrete idea of how we
might recognise a sequence using a formal
grammar.
Transformational grammars
6
--------------------------------------------------Grammar
Parsing automaton
--------------------------------------------------regular grammars
finite state automaton
context-free grammars
push-down automaton
context-sensitive grammars
linear bounded automaton
unrestricted grammars
Turing machine
---------------------------------------------------
Transformational grammars
7




W → aW or W → a
sometimes allowed: W → e
RG generate sequence from left to right (or
right to left: W → Wa or W → a)
RG cannot describe long-range correlations
between the terminal symbols (‘primary
sequence’)
Transformational grammars
8

An example of a regular grammar that
generates only strings of as and bs that have
an odd number of as:
start from S,
S → aT | bS,
T → aS | bT | e.
Transformational grammars
9





One symbol at a time from an input string.
The symbol may be accepted => the
automaton enters a new state.
The symbol may not be accepted => the
automaton halts and reject the string.
If the automaton reaches a final ‘accepting’
state, the input string has been succesfully
recognised and parsed by the automaton.
{states, state transitions of
FSA}{nonterminals, productions of
corresponding grammar}
Transformational grammars
10
RG cannot describe language L when:
 L contains all the strings of the form aa, bb,
abba, baab, abaaba, etc. (a palindrome
language).

L contains all the strings of the form aa,
abab, aabaab (a copy language).
Transformational grammars
11

Regular language:
Palindrome language:
abaaab
aabbaa

Copy language:
aabaab


Palindrome and copy languages have
correlations between distant positions.
Transformational grammars
12



The reason: RNA secondary structure is a
kind of palindrome language.
The context-free grammars (CFG) permit
additional rules that allow the grammar to
create nested, long-distance pairwise
correlations between terminal symbols.
S → aSa | bSb | aa | bb
S => aSa => aaSaa => aabSbaa =>
aabaabaa
Transformational grammars
13






The parsing automaton for CFGs is called a pushdown automaton.
A limited number of symbols are kept in a pushdown stack.
A push-down automaton parses a sequence from
left to right according to the algorithm.
The stack is initialised by pushing the start
nonterminal into it.
The steps are iterated until no input symbols
remain.
If the stack is empty at the end then the sequence
has been successfully parsed.
Transformational grammars
14



Pop a symbol off the stack.
If the poped symbol is nonterminal:
- Peek ahead in the input from the current position
and choose a valid production for the nonterminal.
If there is no valid production, terminate and reject
the sequence.
- Push the right side of the chosen production rule
onto the stack, rightmost symbols first.
If the poped symbol is a terminal:
- Compare it to the current symbol of the input. If
it matches, move the automaton to the right on the
input (the input symbol is accepted). If it does not
match, terminate and reject the sequence.
Transformational grammars
15


Copy language: cc, acca, agaccaga, etc.
initialisation:
S → CW
nonterminal generation:
W → AÂW | GĜW | C
nonterminal reordering:
ÂG → GÂ
ÂA → AÂ
ĜA → AĜ
ĜG → GĜ
terminal generation:
CA → aC
CG → gC
ÂC → Ca
ĜC → Cg
termination:
CC → cc
Transformational grammars
16

A mechanism for working backwards through
all possible derivations:
either the start was reached, or valid derivation was
not found.



Finite number of possible derivations to
examine.
Abstractly: ‘tape’ of linear memory and a
read/write head.
The number of possible derivations is
exponentially large.
Transformational grammars
17


Nondeterministic polynomial problems:
there is no known polynomial-time
algorithm for finding a solution, but a
solution can be checked for correctness in
polynomial time. [Context-sensitive
grammars parsing.]
A subclass of NP problems - NP-complete
problems. A polynomial time algorithm that
solves one NP-complete problem will solve
all of them. [Context-free grammar
parsing.]
Transformational grammars
18



Left and right sides of the production rules
can be any combinations of symbols.
The parsing automaton is a Turing machine.
There is no general algorithm for
determination whether a string has a valid
derivation in less than infinite time.
Transformational grammars
19
Download