Algebraic Representation of Syntagmatic Structures

advertisement
Algebraic Representation of Syntagmatic Structures
V.S.YAKOVISHIN
v_yk@tut.by
Abstract. The proposed method of language description is based on the assumption of algebraic essence of
natural language sentences. Under the assumption, natural language sentences are considered as a surface
embodiment of elements of a ring-like algebra. As a result, the traditional formal-grammar tools are
supplemented with a set of special symbols denoting the algebraic operations on a set of sentences. The special
symbols can be replaced in the generation process by designations of semantic values that represent functional
meanings of the parts of sentences and serve as the features of the marked dependent members of the
syntagmatic oppositions. The formal (semantic) language generated by such augmented grammar is a set of
syntagmatic structures presented in the form of mathematical expressions with explicit functional meanings of
sentence parts.
Key words: Formal language, parts of speech, parts of the sentence, semantics, syntagmatic structure, syntax
Our experience hitherto justifies us in believing that nature is
the realization of the simplest conceivable mathematical ideas.
A. Ei n s t e i n . On the Method of Theoretical Physics
1. Introduction
The experience of the use of classical formal grammars for natural language descriptions has
revealed their syntagmatic insufficiency: the formal grammars allow one to represent the
paradigmatic structure of the sentences (a system of grammatical categories), while the
syntagmatic structures (a system of functional notions) remain ulterior1. In the formal
description, the natural language sentence is usually expressed as a linear sequence where one
can only fix the contiguity of words and their belonging to corresponding grammatical
categories, namely, word classes known as parts of speech2.
The proposed bellow method is based on the assumption of algebraic essence of natural
language sentences. As a result of the assumption, the tools of traditional generative
grammars are supplemented with a set of special symbols, denoting algebraic operations on a
set of sentences. The initial symbols of the algebraic operations are used at the first (syntactic)
stage of generation; at the second (semantic) stage, they are replaced by designations for
semantic values that represent the grammatical meanings of sentence parts and serve as
features of the dependence in syntagmatic oppositions between marked dependent and
unmarked independent members.
Thus sentence syntagmatic structures can be expressed in the explicit form using standard
mathematical notation. The language generated by the formal grammar is a set of artificial
expressions each of which represents an intrinsic cognizable essence of all its surface natural
(textual or verbal) manifestations.
1
It is known that the use of functional notions (like “subject”, “predicate”) in formal-grammar descriptions
is connected with confusion of categorical and functional notions “by assigning categorical status to both, and
thus fails to express the relational character of the functional notions” (Chomsky, 1965: 69).
2
Interestingly, such language description (without the direct indication on functional values of words in the
sentence) is in essence identical with the syntax of Alexandrian grammarians' time (2nd cent AD), when
concepts of sentence parts were still absent, and the sentence syntagmatic structure was described (in Apollonius
Dyscolus' works) only in paradigmatic terms as a compatibility of parts of speech.
2. Formal Language of Sentences
The necessary extension of the formal-grammar tools can be obtained on the basis of the
following assumption - hypothesis of algebraic essence of natural language sentences
(Yakovishin, 1999:77).
Ring-like algebra. Every natural language sentence is a surface manifestation of a
certain element of the ring-like dibasic algebra with a set of words (free monoid defined
over the alphabet) and a set of sentences on which one unary operation and a pair of
binary operations – coordination (“addition”) and determination (“multiplication”) - are
defined. The operations satisfy the following conditions: each of them can be denoted by
different symbols that represent its semantic values; coordination is commutative and
associative; determination is non-commutative, non-associative, and one-sided
distributive over coordination.
All the properties of the binary operations manifest themselves in the presence or absence of
semantic identity in pairs of phrases such as:
books and journals  journals and books (commutativity of coordination);
books and journals as well as newspapers  books as well as journals and newspapers
(associativity of coordination);
the author of the book  the book of the author (non-commutativity of determination);
new books of the author  books of the new author (non-associativity of determination);
new and old books  new books and old books (one-sided distributivity of determination
over coordination).
Note that the distributivity of determination becomes apparent in case of coordinated
dependent members. (In the formal representation, this one-sided property is expressed as
left-hand distributivity.) As regard the distributivity over coordinated independent (head)
members (or right-hand distributivity), we suppose that it is a weak (“non-algebraic”)
property. It is known that the right distributivity can not be fulfilled in the case of the “plural
predication”3. One can suppose that the right distributivity is also not fulfilled in the case of
the attribute and all other “non-distributive” parts of sentence, e.g.:
new books and journals (new journals?)  new books and new journals.
The mentioned operations give support for expression of all the grammatical (functional)
meanings of words in the sentence. The unary operation is used for expression of the common
(sentential) grammatical meanings such as modality, negation, question, exclamation, etc. The
binary operations are used for expression of the meanings of coordinate connection
(conjunctive, disjunctive, adversative, etc.) and the meanings of subordinate connection
(attributive, predicative, etc.). At the same time, the algebraic operations serve as markers to
separate adjacent words in the sentence. Thus the natural language can be considered as a
surface material embodiment of some formal language that represents a set of abstract
sentences closed under the given algebraic operations.
In order to obtain the necessary extension of the formal-grammar tools we need some set
of special symbols (we shall call it the auxiliary alphabet) used to represent given algebraic
operations. In the presence of the special symbols (that are not in the basic alphabet), the
E.g., the predicate are students is distributive over coordination of subjects (A and B are students  A is a
student and B is a student), whereas the predicate are shipmates is non-distributive (McKay, 2006).
3
formal language is defined as a set of compound strings, sentences, derived over a set of
words (vocabulary) by means of algebraic operations.
Let A and Ω be respectively the basic and auxiliary alphabets (AΩ); then the
formal language L is in general case a set LL(A*Ω ) derived over a vocabulary A* with
some collection of algebraic operations Ω . Since the whole vocabulary can consist of singleletter words (A*A), and the collection of operations can be an empty set (Ω), the various
versions of formal languages are distinguished:
L A*L(AΩ)L(A*Ω);
one of them is a traditional language of words L(A)A*, and the others are languages of
sentences: L(AΩ) is a language of syntactic structures, and L(A*Ω) is a language of
semantic structures.
Thus, the units of the formal language can represent together a three-level hierarchy, with
symbols, elements of the basic and auxiliary alphabets, i.e., letters (graphemes),
phonemes,
words, elements of the set A*, i.e., strings over A, finite sequences formed from symbols
by means of concatenation, and
sentences, elements of the set L(A*Ω), i.e., finite sequences formed from words by
means of the certain algebraic operations.
Interestingly enough, the formal language whose units represent together a three-level
hierarchy (all sentences consist of words and these of symbols) is obviously a typical
representation of various natural (“not made by hands”) languages4.
3. Syntax vs. Semantics
It seems that every real language can be characterized as some semantically rich algebra semi-group, group, ring, etc., loaded with sense elements representing the semantic force of
the given language. In the semantically rich algebra, each of the operations and each of the
operands can be denoted by sets of different symbols.
So we can assert that the grammar of any real language contains two parts, namely,
syntax and semantics: syntax is a set of rules that generate purely algebraic (extremely
abstract) structures, and semantics is a set of rules that realize the sense interpretation of the
obtained syntactic structures.
The purely abstract structures generated at the syntactic level are evaluated at the
semantic level following the replacement of variable operands by their values – numeric
values (in arithmetic expressions), propositional constants “true” or “false” (in logic
expressions). In case of natural language, all the symbols of syntactic structures can be
“evaluated” as various semantic distinctions (values) known as lexical and grammatical
meanings. The distinctions between operands (terms) can be denoted as lexical meanings by
usual (natural) word stems, while the semantic distinctions between operations are expressed
as grammatical meanings by special (artificial) affixes of generated word forms.
We can suppose that syntactic structures are generated by rules which are identical (or
similar) to those used for description of mathematical (logical) formulas and known as
recursive definitions of well-formed formulas (wffs). It is obvious that the recursive
definitions like “if A and B are wffs, so are A, B, (AB), (AB)” are in essence the usual
4
It looks as if the three-level hierarchy of language units is a characteristic feature for both usual natural
(human) languages and some other natural semiotic systems. Such three-level hierarchy can also be observed in
molecular-biological encoding, likewise known as a cell language. The cell language also uses symbols (four
different nucleotides), words (genes), and sentences (strings of genes). Thus, both human and cell languages can
be represented by grammars with sets of rules governing the formation of sentences from words and the
formation of words from letters (Ji, 1997).
rewrite rules in which the left-hand part is a symbol (designation of a syntactic category) and
each of the right-hand parts is an atomic formula: A A, B B, A(AB), B(AB).
The common property of the syntactic rules is their recursivity. The antecedent of every
rule can be used also on its right-hand side as the rightmost symbol (right recursivity) or
leftmost symbol (left recursivity). The antecedent can also appear twice on right-hand side
(bilateral recursivity). The recursion property of the rules allows syntactic structures to be
expanded.
Thus the grammar rules that generate syntactic structures may be generalized as
productions of the following forms:
SS (X◦S)…X,
X(X◦S)… ,
where S (“sentence”) is the initial symbol; X (“word”) is the start symbol for word derivation;
 is the symbol of any unary operation; ◦ is the symbol of any binary operation (S,XA;
,◦); here and in the following, the “” sign (with the mark of ellipsis) is used to denote
multiple rules for the same antecedent.
The syntax of the formal language can be represented on basis of properties of the given
algebraic operations. So, the commutative and associative coordination () is expressed in the
syntactic rules as the atomic formula XX in which the same symbol is used for both
members in parentheses-free notation. The coordination in the rule with parentheses
S(XX) is necessary for expression of the left-distributive property of determination ()
over coordinated row in strings like X(XX). The non-commutative and non-associative
determination is expressed as an obligatory use of the different symbols and parentheses in
the rules S(XS), X(XS). The parentheses must certainly be used to indicate the
“evaluation” order of the non-associative operation. And so the atomic formula (XS) is
represented twice: in case S(XS), the operation  is “evaluated” from right to left in
(X(XS)), and in case X(XS), it is “evaluated” from left to right in ((XS)S).
In the following table, we can compare the syntax of the natural language algebra
(NL ring) with syntax of other algebraic structures that have a pair of binary operations.
Algebraic structures with two binary operations
Syntax
Ring: addition () and multiplication () are
commutative and associative; multiplication is
distributive over addition.
S(XX) SS X,
XXX SS.
Boolean ring: union () and intersection () are
commutative and associative; union is distributive over
intersection, and intersection is distributive over union.
S(XX)SS X,
XXX (SS).
NL ring: coordination () is commutative and
associative; determination () is non-commutative,
non-associative, and left-distributive over coordination.
S(X X) (X S)X,
XX X (X S).
The semantic interpretation of the generated syntactic expressions can be realized by
usual grammar rules. Hence, the grammar generating sentence languages is considered as an
integration of well-known tools: the recursive definitions of wffs (they will be called the
syntactic rules) are integrated with the traditional productions (they will be called the
semantic rules). That is to say, one can propose to supplement the traditional formal-grammar
tools with a set of special operation symbols and a set of special rules that generate formulas,
i.e., the expanded formal grammar G is defined as an ordered six-tuple:
G AT, AN, T, N, S, P,
where AT and AN are subsets of terminal and non-terminal symbols of the basic alphabet; T,
N are subsets of terminal and non-terminal symbols of the auxiliary alphabet; SN is a start
symbol; P is a finite set of productions (syntactic and semantic rules).
A formal generative grammar possessing non-empty sets of both syntactic and semantic
rules (in case Ω) will be called the grammar of sentences, in contrast to the classic
grammar of words, in which there are no syntactic rules (in case Ω =  ).
We get certainly several versions of generated languages L(G) representing various kinds
of derivable terminal strings – words, abstract formulas, and algebraic expressions:
L(G)  {ω | ω AT*L(A)T L(A*)T, S*ω}.
The presence of the three versions (using all possible grammar means) allows us to obtain
adequate descriptions for various semiotic objects. So the language of words L(G)AT* is a
set of simple strings such as numbers in scale of notation; the syntactic language
L(G)L(A)T is a set of pure symbolic formulas (that do not contain words); and the
semantic language L(G)(A*)T is a set of algebraic expressions, i.e., a set of strings that
include terms (variables or constants), each of which can be a word (separated by the
operation symbols).
4. Abstract Syntagmatic Structures
The syntactic rules allow one to represent the well known concepts of abstract syntagmatic
structures such as syntagmatic markedness (the “agreement” of dependent members), the
kinds of connections between words, the absolutely independent member (so-called
“grammatical subject”), the main attribute of the sentence (known, in particular, as “topic” or
“thema”), homogeneous parts of the sentence.
Syntagmatic markedness. Every dependent member of the syntagmatic structure is
marked in the opposition to a corresponding head member.
The atomic formula (XX) may be considered as an elementary syntagmatic structure
(minimal phrase) well known as the syntagme, namely, a two-word string in which X (the
first word) is the independent (head, governing) member of the syntagme, and X (the second
word) is the dependent (non-head) member. In syntagmatic notation, the words can be, for
clearness, separated by a blank character: (X X).
Indeed, the dependent member X contains a sign of determination used as a start for the
derivation of the meaning of part of the sentence. Thus the determination serves in the role of
formal feature of dependence and is expressed in the generated structures as sign in
syntagmatic oppositions between marked dependent and unmarked head members. As it is
evident, the dependent word-form is (as a marked element) grammatically more informative
than its head member is: it must express not just its "own" grammatical information, but also
must possess data about the grammatical properties of its governing word5.
Stepwise and collateral subordination. In syntagmatic structures, two types of syntactic
connections known as stepwise and collateral subordinations are defined.
5
The grammatical informativeness of a dependent member is clearly apparent in inflectional-type
languages, where syntagmatic markedness is manifested as agreement features: the dependent (marked) wordform is defined by the presence of the received agreement features (see Blevins, 2000).
In the stepwise (consecutive) subordination, the word of the preceding dependent member X
of the preceding syntagme serves as the independent member X of the succeeding syntagme
(as in the book of the new author). The structure with stepwise subordination is expressed by
using the right-recursive rule S (XS) as right syntagmatic expansion:
S (X (X (X...))).
In the collateral (parallel) subordination, several dependent members X are subordinated
to a single independent member X (as in the new book of the author). The structure with
collateral subordination is expressed by using the left-recursive rule X(XS) as left
syntagmatic expansion:
S ((...X X) X).
The two kinds of syntagmatic structures with left and right expansions is usually expressed
by using both the right- and the left-recursive rules, i.e., S (X S) and X  (X S):
(X S ) ((...X X) (X (X X...))),
where one part of the structure is joined in parallel, while the other part is consecutive
(simultaneously collateral and stepwise subordination).
It is necessary to pay special attention to the quantitative distinction between the left- and
the right-expansion structures. The interesting thing is that the distance from the extreme
dependent member (the terminal node of a tree structure) to its governing word (the root
node) is increased in case of left expansion (((...X X) X) X), while the dependent member
always has immediate connection with its governing word in case of right expansion
(X (X (X X...))). So, the syntactic structures with collateral and stepwise subordination are
potentially asymmetric in the sense of presence of different real capabilities to use the left and
the right syntagmatic expansions. The left expansion is limited by the structure depth 6, while
the right expansion admits endless expressions (as in the sentence with attributive clauses like
well-known This is the dog, that worried the cat, that killed the rat).
Absolutely independent part. For any syntagmatic structure, there exists a unique head
word, called the absolutely independent part, which does not contain the meaning of the
part of the sentence.
Indeed, a single syntagmatically unmarked part (so-called “grammatical subject”), placed in
the first position of generated structures, does always not have its governing member. Hence,
it does not contain the determination sign denoting the meaning of the part of the sentence.
The presence of a single member that does not contain any meaning of the part of the
sentence (uniqueness property) is confirmed by observed phenomena. So in many languages,
there are special syntactically neutral word forms to express such a member, namely, the noun
form of the nominative case (in languages with a nominative structure) and the infinitive verb
form. It is typical that nominative case form is usually expressed by the pure stem of the
word (“casus indefinites”) or uses affixes that have lost the meanings. Among such
desemantized affixes is, in particular, the ending of the Indo-European nominative case in -s,
which, as it is believed, derives from the ergative case-marker denoting a previously agentive
meaning of a dependent member7. The main head of a sentence can also be expressed by the
According to Yngve’s depth hypothesis (Yngve, 1960; see also Sampson, 1997), the left expansion (leftbranching structure) demands the increasing of the memory resources for the lengthened sentence; since human
operative memory is limited (by the structure depth 72), the left expansion is also limited.
7
The ergative construction of a sentence is now interpreted as a syntagmatic structure with absence of the
nounal (non-verbal) grammatical subject as absolutely independent member. All case forms in such syntagmatic
structure are determined by the verb; and so it can be asserted that Tesnière's verb-centred structure with verbal
absolutely independent member is best suited in description of the ergative construction (Dressler, 1971).
6
infinitive verb form8 as well as the finite verb form in which the grammatical person (the
subject) is reflected by the inflection of the verb (in so-called “null subject languages”).
General dependent part. There exists a possibility to express a main attribute of the
sentence, namely, the general dependent part that relates to the whole sentence.
The general dependent part can be represented in the formal syntagmatic structure as the
distance-marked element. It can be supposed that the degree of grammatical informativeness
(“markedness degree”) of a dependent member increases in proportion to the distance from its
governing element: the dependent member is the more informative the further it is situated
from the governing member. So, known information content of the predicate in a sentence
derives from the fact that this marked member follows the entire subject group and occupies
the more distant position relative to absolutely independent (governing) member with its
adjuncts. Thus, the general dependent part may be defined as the most distant dependent
member. The syntagmatic structure with the general dependent part can be considered as a
hypersyntagme: the general dependent part is a dependent member of the structure, and the
rest part of the sentence is its independent member.
Note that this syntagmatic structure is every so often considered on the basis of semantic or
morphology distinctions and is known in linguistics as dichotomies topic/comment,
theme/rheme, known/unknown, etc. Of course, syntactically, the topic (theme) is the marked
dependent member9, and the comment (rheme) is the unmarked independent member. Topic
structures can be represented as the bracketed expressions: cf. the topic structure like
((XX)(X X)) as in This book, he reads and topicless structure like (X (X (X X))) as in
He reads this book. The same method is fitted for description of the sentences with so-called
locative inversion. So, the locative-inversion examples (see Bresnan, 1994) may be also
represented as the “topic” structures: ((X X) X) as in In the corner was a lamp; cf. the
structure of the canonical order sentence (X (X X)) as in A lamp was in the corner.
Homofunctional parts. There exists a possibility to express the syntagmatic structure
with homofunctional parts of the sentence.
The syntagmatic structure with homofunctional parts is derived from the initial syntagma by
using the rules S(X X), XX X :
(X S) * (X (X X X)).
The determination sign in this syntagmatic structure, called multisyntagme, represents a
common functional meaning of the parts of the sentence, whereas coordination sign serves as
a means of juncture together the identical parts by extracting the common functional meaning
of these members outside the brackets.
Note that the operation of determination is only left-distributive but not right-distributive,
and so the only homogeneity of dependent members can be expressed by syntactic means. So
there are two kinds of homogeneous parts, which are expressed in language by different
grammatical means: the homogeneous dependent members are expressed by universal
syntactic means as a coordination of the identical parts of sentence (functional homogeneity),
8
The infinitive verb form is also a syntagmatically unmarked member (“null form”) expressing only a
lexical meaning (meaning of process) outside the syntagmatic relations (Karcevski, 1927:18).
9
As is well known, the topic (“the thing being talked about”) can be marked in various languages by special
(topic-indicating) grammatical means, (see Li and Thompson, 1976). For example, Japanese marks it with the
special particle wa: the part of the sentence preceding wa is the topic, and part following wa is a “comment”
about the topic. English employs word order (the topic is placed at the beginning of the sentence), lexical means
(As for, Regarding), and prosodic marks.
while the homogeneous independent members can be expressed only by individual semantic
means as a coordination of the identical parts of speech (categorical homogeneity).
In natural languages, the difference in these two kinds of homogeneity can become
apparent as a difference in the used means of expression: the functional homogeneity is
expressed by a usual (unmarked) connection, while the categorical homogeneity needs some
special (marked) means. So, in the grammar of Chinese, the categorical homogeneity is
expressed by means of special generalizing words like dōu ‘all’, quán ‘whole’, e.g.:
Wŏ mǎile shū, bĕnz, bĭ ‘I have bought books, writing-books, pencils’;
Shū, bĕnz, bĭ dōu shì wŏd ‘Books, writing-books, pencils are mine’.
In the first example, the functional homogeneity (coordinated objects) is expressed by usual
means as the immediately connection of direct objects to the transitive verb mǎi ‘to buy’; in
the second example, the connection of the categorical homogeneity (coordinated nouns) is
realized by a special distributivity marker dōu.
5. Semantic Interpretation
A real content of generated sentences is derived from syntactic structures by the rules of
semantic interpretation. These rules allow one to replace the word symbol X and the operation
symbols by designations of lexical and grammatical meanings.
The word symbol X is at first replaced by designations of word categories. We suppose
that these categories (traditionally called the syntactic parts of speech) are sets of words
possessing a certain common context-governing potential, namely, a possibility to attach their
“own” determinant. That is, we say that the Xi represents a syntactic part of speech if the
appropriate part-of-sentence meaning i exists. Obviously, only three syntactic parts of speech
(existing in English and other languages) may be distinguished at most: the Noun (Xn), the
Verb (Xv), and the category of Quality (Xq)10, i.e.,
X  Xn Xv Xq.
The necessary indication of the given syntactic parts of speech is possibility to attach their
individual determinants (part-of-sentence meanings): the adjectival attribute, adverbial
attribute, and quality attribute.
The symbol of unary operation (used in SS) is replaced by designations denoting the
syntactically independent meanings such as interrogation, exclamation, general-negation
meaning, and all the general-sentence modality expressed in natural language by parenthetical
words, interjections. These grammatical meanings can also be expressed by the well-known
signs, e.g.:      ? ! 
The replacement of coordination and determination symbols allows us to represent all the
conjunction meanings and the traditional part-of-sentence meanings (known also as “semantic
cases”, “semantic roles”), i.e., the meanings of predicate and secondary notional parts
(attribute, adverbial modifier, object). All of the grammatical meanings can be represented by
special designations (grammatical codes) that can consist of several components: an initial
10
The word classes such as the Adjective and the Adverb, which are as usual distinguished in known
syntax systems (e.g., in Fries's four-member system), do not distinguish as syntactic parts of speech because
these classes are possessed by one common attachable attribute (e.g., attribute very in very quick, very quickly).
It is clear that these word classes differ only at the surface level as morphologized parts of the sentence, i.e.,
morphological (but not syntactic) word categories.
letter, which denotes the common (semantically neutral) meaning, and several abbreviations,
which denote more specific meanings (each component is separated by a dot), i.e.,
 c c.dj c.adv c.cns ,
 A O P,
Aa a.prs a.abs a.tm a.pl a.cs ,
O oo.cmt o.dst o.mdt ,
Pp p.pt p.ft p.ct p.pt.ct p.pf 
Here symbols A,O,PN denote the categories of attributive, objective, and predicative
meanings; the abbreviations can be considered as terminal symbols (elements of T) that
denote various grammatical meanings, namely:
meanings of coordination - conjunctive (c), disjunctive (c.dj), adversative (c.adv),
consecutive (c.cns), etc.;
attributive meanings - neutral adjective or adverbial (a), meaning of presence (a.prs), of
absence (a.abs), of time (a.tm), of place (a.pl), of cause (a.cs), etc.;
objective meanings – neutral (direct) objective (o), comitative (o.cmt), destinative (o.dst),
mediative (o.mdt), etc.;
predicative meanings – the predicate in the present tense (p), past tense (p.pt), future tense
(p.ft), present continuous tense (p.ct), past continuous tense (p.pt.ct), present perfect tense
(p.pf), etc.
Of course, the said grammatical meanings are expressed by diverse context-sensitive
variants at the level of a surface (morphological) interpretation. So, the neutral attributive is
expressed by “agreed” word forms such as the adjective (a big house), the adverb (to run
quickly), as well as by “non-agreed” word forms (the leg of the table); similarly, the neutral
objective meaning is expressed by various “agreed” (governed) word forms (to read the book,
to ask for a book, to knock at the door, to consist of parts, etc.); the same predicative meaning
is expressed both by affixes and syntactic words (he writes, he is a teacher, he is red). And
vice versa, the diverse meanings can be expressed by the same word form: cf. a book with
pictures (attributive meaning of presence), to live with parents (objective comitative
meaning), to write with a pen (objective meditative meaning), and so on.
The use of the syntactic parts of speech and designations of various grammatical
meanings allows us to represent all possible sentence patterns. So it is possible to represent
the simple unexpanded and expanded sentences, the compound sentences, the sentences with
homogeneous parts, the complex sentences with all types of subordinate clauses:
((XX)(XX))*((XnaXq)p(XvoXn)),
(XX)*(Xv pXq),
(X (X(XX))) *(Xn p(Xv o(Xn p.pfXv))),
(X(X(XX))) *(Xn p(Xv a.pl(Xn pXv))).
Here the derived syntagmatic structures represent the well-known sentence patterns: a simple
expanded sentence with a verbal predicate, adjectival attribute, and neutral (direct) object; an
infinitive sentence; a complex sentence with an object clause; a complex sentence with an
adverbial clause of place.
The replacement of syntactic parts of speech by concrete lexical meanings presupposes
use of semantic parts of speech, i.e., categories of word root meanings. The belonging of
words to a certain semantic part of speech is manifested as their possibility to attach common
lexical modifiers, i.e., derivational meanings. Here it is noteworthy that there is a formal
distinction between derivational (“word-formative”) and relational (“inflectional”) meanings,
i.e., there is a rigorous criterion for the delimitation of the two different types of grammatical
meanings: the derivational meaning can appear in any syntactic position of the sentence,
while the relational meaning (contained only in inflected word forms) do not appear in the
position of the syntactically neutral (absolutely independent) part.
So Xn, Xv, Xq can be replaced by categories of root and derivational meanings:
XnRn RDn Pn Rc Pc,
XvRv RDv,
XqRq Rq Dq Pq,
RRn Rv Rq Rc.
Here the derived symbols denote: general category of word root meanings (R), the categories
of root meanings of nouns (Rn, Pn), of verbs (Rv), of quality (Rq, Pq), of numerals (Rc, Pc), and
the categories of derivational meanings (Dn, Dv, Dq).
The categories of root meanings can be substituted by usual morphemes (from AT*):
R n book man,
Pnhewe ,
Rv know read ,
Rq little  red ,
Pqso such ,
Rc one two ,
Pc few many ,
The categories of derivational meanings are substituted by special designations (from AN):
Dn Ag Pl ,
Dv Exc Rv ,
DqCp Sp ,
where designations Ag (agent), Pl (plural), Exc (excessive), Rv (revertive), Cp (comparative),
Sp (superlative), etc., indicate the derivational meanings, e.g.: readAgreader, manPlmen,
loadExcoverload, turnRvreturn, littleCplesser, littleSpleast.
The final result of the generative process can be showed on the following examples:
(((Xn aXq) p(Xv oXn)) *(((boy a.little) p(read o.book))
‘The little boy reads a book’;
((Xv pXq) *((read p.pleasant)
‘To read is pleasant’;
((Xn p(Xv o(Xn p.pfXv))) *((We p(know o(he p.pf.return)))
‘We know he has returned’;
(Xn p(Xv a.pl(Xn pXv)))*(house p(stand a.pl(road p.turn)))
‘The house stands where the road turns’.
One can assert that the generated formal syntagmatic structures represent the whole
semantic content of sentences, i.e., they are sufficient for accurate transition to adequate
natural manifestations. Certainly, the transition from the generated artificial (semanticsyntactic) sentences to their natural (textual or verbal) manifestations requires some special
algorithmic techniques that do not belong to the grammatical tools11.
11
The experience in development of transformational-generative grammars showed that transition from the
generated expressions (“deep structures”) to their natural manifestations (“surface structures”) requires special
(“non-grammatical”) actions - such as the simultaneous replacement of more than one symbol, the transposition
of symbols, etc. (see Shaumyan, 1962: 399).
6. Conclusion
It is shown that the formal language generated by the proposed grammar tools is a set of
formal syntagmatic structures expressed in the explicit form using standard mathematical
notation.
In the formal description outlined above, we attempted to unite the tools of formal
grammars with the known notions of traditional linguistics. The proposed formal-grammar
tools make it possible to represent the sentence parts, the parts of speech, and the other
existing grammatical categories, i.e., that which can be called the grammatical system of a
language.
The produced grammatical system may be considered as a certain result of authentically
scientific (experimental) cognition of linguistic reality. It must be again further improved,
investigated, and specified in each particular language description. The grammatical system is
also to become an object of the comparative reconstruction and typological verification.
The formal language proposed above can be used as a linguistic notation (grammatical
interlingua) and a semantic record for knowledge representation. The transition from input
text messages to the internal representation of knowledge may be implemented by means of
the formal language serving as an intermediate link between the text and the growing
conceptual structure (Yakovishin, 1999). Possibility of the automatic transition from input
messages to the internal knowledge representation allows one to realize the accumulation of
knowledge extracted from great volumes of the electronic documentation.
References
Blevins, J.P., 2000, Markedness and agreement, Transactions of the Philological Society 98 (2), 233-262.
Bresnan J., 1994, Locative inversion and the architecture of universal grammar, Language 70 (1), 72-131.
Chomsky, N., 1965, Aspects of the Theory of Syntax, Cambridge, MA: The MIT Press.
Dressler, W., 1971, Űber die Rekonstruktion der indogermanischen Syntax, Zeitschrift fűr Vergleichende
Sprachforschung. 85 (1), 5-22.
Ji, S., 1997, Isomorphism between cell and human language: molecular biological, bioinformatic and linguistic
implications, BioSystems, 44 (1), 17-39.
Karcevski, S., 1927, Système du Verbe Russe: Essai de Linguistique Synchronique, Prague: Plamja.
Li, Ch. N. and Thompson, S. A., 1976, Subject and topic, pp. 457-489 in Syntax and Semantic:Subject and
Topic, Charles Li (ed.), New York: Academic Press. (T ran sl at io n in Новое в зарубежной лингвистике.
Вып. XI. – M.:Прогресс, 1982.)
McKay, Th., 2006, Plural Predication, Oxford: Clarendon Press.
Sampson, G. R., 1997, Depth in English grammar, Journal of Linguistics, 33, 131-151.
Shaumyan, S. K., 1962, Theoretical foundations of transformational grammars [in Russian], pp. 391-411 in
Новое в лингвистике. Вып. II. – M.: ИЛ.
Yakovishin, V. S., 1999, Transformation of syntagmatic structures into a form of knowledge representation [in
Russian], Автоматика и вычислительная техника, 1, 76-83. (T ransl at io n in Automatic Control and
Computer Sciences, New York: Allerton Press, Inc., 33 (1), 64-69.)
Yngve V. N., 1960, A model and hypothesis for language structure, Proceedings of the American Philosophical
Society, 104, 444-466.
Download