Analysis of Croatian corpus of child language (age 7 to 12) and its

advertisement
Mr. Katarina Aladrović Slovaček, research assistant
Melita Ivanković, teacher
The Faculty for Teacher Education, University of Zagreb
Savska cesta 77, Zagreb
kaladrovic@gmail.com
melitaivankovic@yahoo.com
Analysis of Croatian corpus of child language (age 7 to 12) and its usage in
teaching
Language as an abstract system of signs has its two realizations: oral and written. A
child utters syllables first, then words, phrases and then sentences. The process of voice
articulation, together with one of the phases in language learning, is finished by the age of 6.
In this phase a child whose language development is proper is capable to compose a sentence
consisting of five, six or seven words (Pavličević-Franić, 2005). Vocabulary of a seven-yearold contains around 10 000 words (Pavličević- Franić, Gazdić- Alerić, 2010). Institutional
learning of the Croatian language as a mother tongue starts in the kindergarten and it happens
through different communicational language games. However, when starting school (at the
age of 6 or 7), a child starts to learn language systematically. Since the Croatian language, as
a Slavic language, is morphologically rich, language learning often presents a real problem to
children. They are afraid of learning language and they don't like it (Pavličević- Franić,
Aladrović, 2009). This fact shows the necessity to change language teaching, especially in
primary school which according to Croatian educational system lasts eight years. Those
changes require the usage of communicative-functional approach in language teaching
(Miljević- Riđički and others, 2004).
Croatian National Educational Corpus has over a hundred million basic words
(www.hnk.ffzg.hr, June 2010). A base of Croatian child language, which includes a period till
the age of six, was created within a CHILDES base. But, the corpus of Croatian child
language during primary school is not registered in the mentioned bases, so the purpose of
this research is to analyze the corpus of child language from the age of seven to twelve, the
period when children are in the concrete operational stage (Piaget, 1969). The corpus has
around 1500 written works collected while doing a research in 30 primary schools in all the
regions of Croatia, and it contains around 30 000 words. The corpus will be coded and
afterwards analyzed on morphological, syntactical and lexical level. The research will also try
to answer the question of how to use the corpus while learning the mother tongue in primary
school and help children to start loving their language and to learn it happily.
I. Language acquisition and learning
Language makes man different from all other creatures on the Earth and therefore the
language acquisition is, on one hand, a completely usual occurrence while, on the other hand,
it is very special and fascinating. The language acquisition itself shows the general features
common to all children in the world since all of them manage to successfully acquire the
language regardless of its other features, regardless of the language to which the children are
exposed in natural situations and the teaching method since they manage to acquire the most
different language stages in a very short period. The child's language development is
connected to its physical, cognitive, emotional, social and communicative development
(Owens, 1984). In the first several years the children of orderly language development gain
full control of their language. When they are five years old, the children's vocabulary
comprises 1,000 words, the majority of the phonological and grammatical system of their
language has been acquired as well as the basics of the word meaning and their use and the
manner of use of language in certain situations (McGregor, 2009: 203). The language
acquisition also depends on the habits of spoken language in the child's surroundings, the
speech of their parents, other members of the community which includes other children
interacting with them. Though all children are able to learn the language to which they are
exposed in early childhood, there are nevertheless individual differences among the children,
such as: features of their mother tongue and different circumstances in which the language is
learnt since they influence the speed by which the language is learnt. In order to explain the
language acquisition, the researchers developed several different theories which can be
divided into three main ones: behaviourist, generative and cognitive. Behaviourists (most
significant of them being B. G. Skinner), consider the language acquisition to be learnt
behaviour and therefore they condition it by creation of associative links between the
stimulus and the response. They believe that the language and speech are learnt by imitation
of speech of the adult person which could be called learning according to the model:
auditive/visual stimulus - response to stimulus - reinforcement. The child listens to the model
and imitates what they have auditory perceived. Imitating the adult speakers, by the method
of trials and errors, stimulation and repeating, the child acquires the language structures
which results in improvement of their language development (Pavličević-Franić, 2005).
Generative theory occurred in the 50's of the XX century. In linguistics, this period was
marked by development of generative grammar which views at the language as the
knowledge of people to whom that language is a mother tongue or who are native speakers of
that language. The aim of this linguistic theory is to reach the grammar in the mind of the
speaker (Palmović, 2005). N. Chomsky, the creator of the generative grammar, differentiates
the competence of the native speaker as the unconscious innate language knowledge and the
performance – actual use of language in the actual situation. He believes that language
acquisition is actually grammar acquisition because the children are born with language
abilities and general knowledge about the form of the human language (Vilke, 1991). In order
to develop it, it is necessary to expose the children to the language of the environment. In this
way, the innate grammar of the children is stimulated and appropriately reinforced. This is
how the generativists prove the fact that almost all children manage to acquire the mother
tongue regardless of their other differences and the differences among the languages to which
they are exposed (Jelaska, 2007). Chomsky is the main representative of the nativist theory
which explains the easiness and the speed by which the children acquire the language thanks
to the fact that a large part of their language knowledge is innate to them (Palmović, 2005).
The innateness of the language model, according to Chomsky (1965), explains the similarity
in the process of language acquisition in different languages and cultures. Chomsky calls the
content of the said language model LAD - the language acquisition device. According to such
language acquisition model, the child is exposed to the language data from which they
discover the language parameters specific for the particular language (Kuvač and Palmović,
2007: 52). This means that all children go through the same stages of language acquisition,
use similar structures and make similar deviations from the language to which they are
exposed, regardless of the language which they are acquiring. They only have to be exposed
to any human language and their innate grammars will be stimulated and reinforced in a
certain way. Based on these facts one can conclude that the language speakers adopted the
production rules applicable to new linguistic occurrences and therefore the language
acquisition is actually the grammar acquisition and acquisition of the cognitive system which
enables the people to understand and use the language. Grammar is not learnt, it is acquired,
adopted, spoken (Jelaska, 2007: 68). Though the grammars of natural languages differ, they
also have a lot of similarities which are called universalities. The said universalities are
considered to be an important proof of innateness because they could not have appeared by
accident. The theory of innate ability is also proved by the fact that the children manage to
master the language much better than it can be expected on the basis of the language data
which they have been exposed to. The representative of cognitive theories, J. Piaget (1967),
believes that cognitive abilities enable learning in general, which includes the learning of
language which means that the developed cognitive abilities are necessary precondition for
successful language development (according to: Pavličević-Franić, 2005). Piaget considered
the language to be a means of the thinking process or thinking about the reality, the
appearance of language therefore depending on the structure of the reality itself. In view of
this fact, he believed that the appearance of language is conditioned by the level of the
sensorimotor intelligence during the first eighteen months of the child's life. J. Piaget (1947.)
believes that cognitive abilities enable learning in general, including the learning of language
which means that the developed cognitive abilities are precondition for successful language
development (cognitive theory). He claims that the language acquisition and learning happens
in four stages: sensorimotor (from birth to the age of two); preoperational (from the age of
two to the age of seven), concrete operations (from the age of seven to the age of
eleven/twelve) and formal operations stage. After discussing Piaget’s theory, L. Vigotski
(1962) concluded that the child becomes a sensible being at the moment of occurrence of
speech and that the development of cognitive abilities and the child's development depend on
language and are conditioned by language (according to Kovačević, 1996). The language
acquisition does not end when the child enters school (in Croatian educational system about
the age of seven), but goes on until the age of twelve when the language automation occurs
which means that the children know the morphology and syntax on the level of language
automation. This period is called the early language learning period and in Croatian
educational system it lasts from the age of seven to the age of twelve. This is the period when
the language should be learnt by developing and stimulating communicative competence.
Since the language learning is very often connected with negative attitude of pupils towards
the mother tongue caused by the quantity and difficulty of the content, the aim of this paper
was to find out whether this attitude could be changed and what improvements can be made if
corpus is introduced as one of the methods of language learning. The research made in 2004
(Miljević-Riđički and associates) confirmed that children do not like the Croatian language as
the mother tongue and that it is placed on the bottom of the scale of favourite subjects. The
research made in 2009 (Pavličević-Franić and Aladrović) shows somewhat better attitude of
pupils to the Croatian language as the school subject, though it is still connected to many
negative connotations. The extensiveness of content, inappropriate manner of content
processing and inadequate content can cause problems in learning of the standard form of the
Croatian language and „the fear of language“ which can consequentially cause long-term
problems to the pupils related to their expression and literacy. For the sake of illustration, it
should be mentioned that the Croatian language is the most comprehensive subject in primary
school which the pupils are taught for five lessons per week in the period of early language
learning (until the age of 12). In addition, the communication in the mother tongue is the first
and the crucial competence of the lifelong learning since the child will more easily learn
other languages as well as other subjects if they have learnt their language well (European
Commission, 2005). With the aim to improve the quality of the Croatian language teaching
and learning, the intention was to investigate a small corpus of written papers of pupils in
order to identify the language problems which the pupils encounter at a certain age and to
accordingly change the language teaching and learning methods in order to change the
attitude of pupils towards the Croatian language as a school subject.
II. Corpus of Children's Language
Croatian National Corpus includes 101.3 million tokens (www.hnk.ffzg.hr) and consists
of a systematic collection of selected texts of the contemporary the Croatian language
covering different media, genres, styles, areas and themes. Apart from Croatian National
Corpus, there are some other corpora of the Croatian language, such as the Croatian
Language Treasury of the Institute of the Croatian Language and Linguistics (www.ihhj.hr,
December 2010).
Research of children’s language in Croatia did not start as early as in the United Kingdom,
the United States of America or Germany. The first description of the children’s language
and its lexical development was provided by Ivan Furlan in his dissertation „Diversity of
vocabulary and speech structure“(1961). Many language researches were done in the 60’s
and 70’s, however, their name contained the word „speech“ instead of the word „language“.
By the end of the 70’s, Ante Fulgosi published his paper „Recent Research in
Psycholinguistics“ (1979), where he presented the recent research from the field of
psycholinguistics which was primarily inspired by the generative theory of Noam Chomsky
about the language acquisition. More systematic research of children’s language had not
started until the 80’s of the XX century and the works of Stjepko Težak (Grammar in Primary
School, 1980) and the 90’s and the works of M. Ljubešić, M. Kovačević, Z. Babić and D.
Pavličević-Franić, while a larger step forward was made with opening of the Laboratory for
Psycholinguistic Research (POLIN) in 1999. Through their research, the Laboratory members
contributed to the understanding of acquisition of the children’s language within the Croatian
language corpus. The first Croatian corpus of children’s language was made by POLIN and it
is included in the CHILDES world database. It consists of spontaneous speech of three
monolingual children and a corpus of story-telling abilities of preschool children. The corpus
of school children’s language (lexical level) has been collected and shown in the First School
Dictionary of the Croatian Language which is at the same time the only e-dictionary with
2,500 explained words with recorded correct pronunciation, 2,000 drawings and 185
cartoons. It was published by the Institute of the Croatian Language and Linguistics in 2009.
The corpus of school children’s language also includes the textbook language whose analysis
was made in 2008 (Pavličević-Franić and Gazdić-Alerić, 2010). Within the textbook corpus,
the words which most often appear in textbooks were counted and then sorted out into four
categories: polysyllabic, affective, professional terminology and other.
III. Research
3.1. Methodology of the research
The research has been conducted in primary schools of the Republic of Croatia from the
second to the sixth grade. The research instruments were written papers at certain topics
given to examinees depending on their grade (including the topics from the area of linguistic
expression). Out of the large sample of 1,500 written papers of pupils, 100 papers (of
approximate length) were selected, 20 written papers from each class. The papers were
selected by random sampling. The written papers were analyzed by the content analysis
method, while the papers were processed in the SPSS statistics software by the t-test,
variance analysis and chi-square test methods.
3.2. Targets of the research
Targets of the research are:
1. To investigate the syntactic form of written works of the pupils.
2. To investigate how much the pupils deviate from the grammatical and orthographic
standards in their written works.
3. To investigate whether there is a statistically significant difference in the results as it
regards the age, sex and final grade of the pupil.
3.3. Hypotheses of the research
Hypotheses of the research are:
H1. – In their writing, the pupils mostly use simple sentences, and if they use multiple
sentences, most of them are compound.
H2. – The pupils show the largest deviation in knowledge of the orthographic standard:
writing of the sound č and ć, writing of the reflex of the proto-Slavic yat and knowledge of
rules on the capital and small letter.
H3. – There is no statistically significant difference in the results considering the age.
3.4. Results of the research
It is interesting that in the 2nd and the 4th grade one sentence per essay can be found,
while in the 3rd and the 6th grade up to ten sentences can be found. On the average, the
essays consist of three to five sentences. The largest number of sentences can be found in the
3rd and the 5th grade which is probably connected with the topic about which the pupils of
these grades wrote and which inspired them the most. The least number of sentences has been
noticed in the 2nd grade, statistically significantly smaller than in other grades, which has
been expected since the pupils have just started to write essays and therefore their essays are
less comprehensive and mostly contain simple sentences.
Graph 1 Average of sentences per essay
Average of sentences per essay
8
6
4
2
0
2.
3.
4.
5.
6.
The essays mostly consist of simple sentences. However, a certain number of multiple
sentences can be found, mostly compound sentences connected by conjunctions: and, or, but
and so. In the majority of the essays there is only one multiple sentence, but in the 3rd and
5th grade there are even up to four multiple sentences per essay. In some grades, there are
two to three multiple sentences per essay. There is no statistically significant difference in use
of multiple sentences as it concerns the age.
Graph 2 Number of multimple sentences per essay
Orthographic errors occur the least frequently in the 2nd grade and mostly in the 3rd and the
6th grade, probably because in these grades the largest number of sentences occurs, therefore
the number of errors is also the largest. Regarding the orthographic errors, most of them refer
to orthographic errors related to writing of punctuation marks, especially commas,
exclamation marks and ellipsis. Apart from punctuation errors, a large number of
orthographic errors refers to writing of the capital and small first letter and writing of the
sounds č and ć as well as the reflex of the proto-Slavic sound Yat.
Grammatical errors most often occur, just as the orthographic errors, in the 3rd and the 6th
grade and the least often in the 2nd grade. If we except the 2nd grade because of the said
reasons, it can be concluded that the analysis of variance does not show a statistically
significant difference in the number of errors considering the grade. As it regards the
grammatical errors, the most common are the errors in use of grammatical cases, use of the
preposition "s" or "sa" and the errors in use of the reflexive - possessive pronoun. Regarding
the syntactic morphological errors, the most common are the errors related to the position of
the enclitic in the sentence. The phonological errors are related to writing of the sounds č and
ć and writing of the reflex of the proto-Slavic yat. The lexical errors can be divided into
errors related to the use of loan words and dialectal words. Use of dialectal words is more
prominent, while Anglicisms are mostly used as greetings or short answers in the text.
Analysis of variance shows that there is no statistically significant difference concerning the
age and the sex. Piaget's theory has been confirmed that the children until the age of 12 are in
the concrete operational stage and that the children of the 5th and the 6th grade are not
cognitively more developed than the children of the 2nd, 3rd and 4th grade.
IV. Conclusion
The research shows that is necessary to create a corpus of spoken and written Croatian
language of the school age. The corpus would include both the spoken and written language
of Croatian primary school pupils and it would help in teaching the mother tongue. The
corpus can be used as a source for learning and creating and can also help to the teachers to
create the tasks which would be appropriate for the pupils. Since we find a large number of
deviations from the standard both in the written and the spoken language of primary school
pupils in as much as 96% of cases, it is important to create this corpus which would help in
elimination of different types of grammatical and orthographic errors. In the analyzed corpus,
one can see a large number of orthographic and grammatical errors. Such corpus can be used
to learn from the errors, especially to identify the errors and correct them. It was also
interesting to see the structure of the text which was mostly based on simple sentences and
contained a large number of punctuation errors. The topics were generally realized. The
corpus of spoken and written language of primary school pupils can provide insight into
many segments: apart from errors, one can also observe the vocabulary, sentence structure,
writing style and many other components. The question is only how to form this corpus and
how to mark it since this corpus includes errors and it would be useful to learn from these
errors, especially by the trial and error method, which the conducted research studies have
already proved to be a very successful method of language learning resulting in decrease of
errors in written and spoken language (Pavličević-Franić and Aladrović Slovaček, 2011). The
corpus is a good stimulus for learning and also for a subsequent analysis through which the
pupils can, using the trial-and-error method, come to the correct results at the same time
learning the language in an interesting way. In this way we could influence how the speakers
perceive their mother tongue and raise their literacy level, which is necessary, especially at
the time when the Croatian language is a minor one since it is spoken by only about 7 million
people. Moreover, this would also raise the awareness of the importance to learn Croatian as
the mother tongue due to the fact that the Republic of Croatia is soon to accede to the
European Union and the Croatian language will become one of the official languages of the
said community.
V. References
Chomsky, N. 1966: Gramatika i um, Beograd.
European Framework of References for Lifetime Learning, Teaching Goals and Methods:
Comunicative Competence. 12th November 2008
http://www.ucrlc.org/essentials/goalsmethods/goal.htm/
Jelaska, Z. 2007: Hrvatski kao drugi i strani jezik, Zagreb.
Kovačević, M. 1996: Pomaknute granice ranog jezičnog razvoja, in: „Suvremena
lingvistika“ 41-42, pp. 309.-318.
Kuvač, J., Palmović, M. 2007: Metodologija istraživanja dječjega jezika, Jastrebarsko.
McGregor, B. W. 2009: Linguistics: An Introduction, London.
Miljević-Riđički, R. 2000: Učitelji za učitelje, Zagreb.
Nastavni plan i program za osnovnu školu u Republici Hrvatskoj, (eds. Ministarstvo znanosti,
obrazovanja i športa RH), Zagreb, pp. 42-48.
Owens, R. E. 1984: Language Development: An Introduction, Columbus.
Palmović, M. 2005: Teorije jezičnoga usvajanja, in: „Zrno: časopis za obitelj, školu i vrtić“,
65/ 91, pp. 5-6.
Pavličević-Franić, D., Aladrović Slovaček, K. 2011: Development of communicative
competence among plurilingual student sin monolingual Croatian language practice, in: „Vestnik za
tuje jezike“, 2, pp. 175-192.
Pavličević-Franić, D., Gazdić-Alerić, T. 2010: Utjecaj udžbeničkih tekstova na rani leksički
razvoj u hrvatskom jeziku, in: „Jezik“, 3, pp. 81-96.
Pavličević-Franić, D., Aladrović, K. 2009: Psiholingvističke i humanističke odrednice u
nastavi hrvatskoga jezika, in: „Rano učenje hrvatskoga jezika 2“, pp. 165-186.
Piaget, J. 1969: Intelektualni razvoj deteta, Beograd.
Pavličević-Franić, D. 2005: Komunikacijom do gramatike, Zagreb.
The European Framework of References for Lifetime Learning. 2005: Teaching Goals
and
Methods:
Comunicative
Competence.
European
Commission.
www.ucrlc.org/essentials/goalsmethods/goal.htm, May 2010.
Vygotsky, L. 1962: Thought and language, Cambridge.
Vilke, M. 1991: Vaše dijete i jezik, Zagreb.
Zajednički europski referentni okvir za jezike: učenje, poučavanje, vrednovanje, (eds. Školska
knjiga), Zagreb, pp. 1-75.
www.hnk.ffzg.hr, June 2010.
www.ihhj.hr, December 2010.
Download