FAE ESP Materials Derived from a Web-based Corpus

TESL Ontario’s 36th Annual Conference “Celebrating the International Year of Languages” Friday, November 14, 2008, 8.30 – 9.20 am (FAE) Sheraton Centre Toronto, Canada Pisamai Supatranont, Ph.D. Rajamangala University of Technology Lanna Tak, Thailand supatranont@yahoo.com Presentation Outline Background and rationale Research questions Research methodology Data analysis and findings Discussion Background of the Study The study is:  Funded by  Conducted in February – July 2008 at  Under supervision of Assoc. Prof. David Hall  With consultation of Prof. Pam Peters The researcher is from RMUTL Tak, Thailand Rationale of the Study Cause Influence of Information and Communication Technology (ICT) in academic and professional settings Effect To get good jobs, university students both in ICT and non-ICT need English to communicate in ICT working environment. ESP Materials Development 1. Limitation of relevant ESP textbooks Although specialized texts in ICT are abundant, they are not suitable for unmodified and unsupported use directly in ESP classes because of their difficulty for EFL students.  Need for teacher-designed materials in ESP teaching. ESP Materials Development 2. Difference of students’ background knowledge ICT students:  posses some specialized knowledge and skills to design hardware and software.  need English to communicate their knowledge in academic and professional contexts. Non-ICT students:  have little knowledge of ICT  need ICT knowledge as computer users.  need to learn both basic ICT concepts and English to communicate in business companies or organizations. Different learning needs = same level of English = different level of specialized knowledge  Need for different specialized contents to facilitate ESP learning ESP Materials Development 3. Insufficiency of EFL students’ lexical knowledge  It was found that undergraduate students in EFL countries e.g. in Thailand (Supatranont, 2005), Oman (Cobb and Horst, 2001), and Indonesia (Nurweni and Read, 1999) have limited lexical knowledge and less proficient in English than what is expected for students at a university level.  In Supatranont’s study (2005), lexical knowledge of RMUTL students was found below the lexical threshold to academic study. With limited vocabulary size of academic words, students cannot cope well with the specialized texts because most frequent words in these texts consist of academic and sub-technical words (Mundraya, 2006).  Academic and technical words should be integrated as main vocabulary components of language input. ESP Materials Development  Lexical threshold to academic study is composed of two wordlists: (Nation, 2001; Coxhead & Nation, 2001; Cobb & Horst, 2001; and Nation & Waring, 1997) General service list (GSL) = 2,000 high frequency words (West, 1953) (and) Academic word list (AWL) = 570 academic words (Coxhead, 1998) Knowledge of these two wordlists is estimated to provide over 90% coverage of academic texts in all disciplines. To read academic texts comprehensibly, 95% coverage of words known in that text is the minimum point (Laufer, 1988). Academic vocabulary in this study is based on the GSL and AWL (downloaded from http://www.uefap.com/vocab/vocfram.htm) Objectives of the Study 1. To identify high-frequency language items in ICT specialized texts by focusing on lexical areas:  academic words: based on GSL and AWL  technical words: words with particular meaning in ICT  technical collocations: noun phrases with particular meaning in ICT 2. To obtain a set of language input to design a course material for teaching English for ICT to non-ICT EFL students by using a corpus-based analysis method. Research Questions 1 What are high-frequency academic words in ICT specialized texts? 2 3 What are high-frequency technical words in ICT specialized texts? What are high-frequency technical collocations in ICT specialized texts? Research Methodology The methodology is divided into three main steps Text selection Corpus Compilation Corpus-based analysis Corpus compilation Text Selection Study a corpus with Text-analysis software Research Methodology Text Selection  Texts selected exclusively from web-based tutorials in ICT  Authors: mostly lecturers in universities and tutorial centers.  5 topics concerning fundamental ICT knowledge:      Computer hardware Operating systems and graphical user interfaces (OS and GUIs) Basic application software Multimedia software Internet software  3 text types: articles, manuals and advertisements (of hardware) Research Methodology Number of Text Selection 25 20 15 Number of Files 10 5 0 Hardware OS and GUIs Application Multimedia Internet Articles 20 20 20 20 20 Manuals 25 20 20 20 20 Advertisement 25 - - - - Total files = 230 Research Methodology Number of words 1500-2000 w/article 700-1000 w/manual 200-500 w/ad 40,000 35,000 30,000 25,000 Number of words 20,000 15,000 10,000 5,000 0 Hardware OS and GUIs Application Multimedia Internet Articles 36,192 35,525 37,857 38,011 37,925 Manuals 20,171 17,535 18,889 18,597 18,353 Advertisement 8,423 - - - - Total words = 287,478 Research Methodology Design of the EICT Corpus Size 287,478 words Text files 230 files Word types 6,064 word types Medium Written Language Texts written in English Authorship Texts written by experts in academic institutions, tutorial centers or manufactures Contents Fundamental knowledge of ICT Text topics 5 topics: 1. Computer hardware 2. Operating system and graphical user interfaces (OS and GUIs) 3. Basic application software 4. Multimedia software 5. Internet software Text types 1.Articles: passages including definitions and descriptions 2.Manuals: instructions for operating hardware and software 3.Advertisements: details of features and quality computer hardware products Research Methodology Text-analysis Software: WordSmith Tools     WordSmith Tools version 5.0 Developed by Mike Scott (2007) University of Liverpool, UK www.lexically.net/wordsmith/index.html Research Methodology Reference Corpus According to Bowker and Pearson (2002), Hunston (2002), and Scott (2001):  To ensure the word’s ‘keyness’, the frequency wordlist of a corpus should be compared with a larger reference corpus.  With Log Likelihood Formula: Unusually frequent or infrequent words can be identified for their ‘keyness’ and the significance difference (p value) i.e.:  Words with positive keyness => occurs unusually more often.  Words with negative keyness => occurs unusually less often. Research Methodology Reference Corpus: BNC  British National Corpus (BNC)  A general corpus of 100 million words  Samples of written and spoken language from a wide range of sources  BNC website is http://www.natcorp.ox.ac.uk  In the present study, BNC wordlist is from WordSmith Tools Data Analysis and Findings The method of analysis is adapted from the suggestions of Bowker and Pearson (2002), and Scott (2001). The method and findings are described according to the research questions. 1. What are high-frequency academic words in ICT specialized texts? 2. What are high-frequency technical words in ICT specialized texts? 3. What are high-frequency technical collocations in ICT specialized texts? Data Analysis and Findings Question 1: What are high-frequency academic words in ICT specialized texts? 1.1 Download GSL and AWL wordlists from the website of the University of Hertfordshire, UK at http://www.uefap.com/vocab/vocfram.htm. Use these words as academic word candidates. 1,937 GSL Headwords 570 AWL Headwords Data Analysis and Findings 1.2 Build a wordlist of the EICT Corpus, resulting totally in 6064 word types. 1.3 Use academic word candidates to mark all GSL and AWL in the corpus. Lemmatize them, resulting in 941 headwords of academic word candidates with ≥ 5 occurrences. Sort in alphabetical order Data Analysis and Findings 1.4 Compare the list of academic word candidates with the list of BNC, using Log Likelihood Formula at the p value 0.000001. The software is set:  To process with full lemma  To display only words with positive keyness Data Analysis and Findings Finding 1 From 941 words, 343 words with ≥ 5 occurrences, positive keyness, and significance difference are cropped up as high-frequency academic words. Excluding function words Sort according to keyness Sort in alphabetical order Data Analysis and Findings Finding 1 It was found that: general words + technical sense in specialized texts. From 343 words in total: 95 words e.g. burn, window, word etc. convey particular meanings in ICT different from their meanings in general texts. Simple & familiar (but) => students’ confusion when interpreting incorrectly As found in previous studies in related fields of ICT. For example: Lam’s study (in Chen and Ge, 2007) reported computer science students’ confusion when interpreting the word ‘field’ in the agricultural sense rather than as an options in a database program. These words were classified as semi-technical words. Data Analysis and Findings Finding 1 All 343 high-frequency academic words were classified into 2 groups. 1. 248 academic words: e.g. access, compute, illustrate indicate, identify, manipulate, term, category, feature, occurrence, symbol etc. 2. 95 semi-technical words: 2.1 Words with technical senses or particular meaning e.g. burn, drive, refresh, card, domain, engine, memory, field application, character, Word, document, window etc. 2.2 Words in mathematics, geometric shape and diagram e.g. add, multiply, divide, axis, table, row, degree etc. 2.3. Simple words frequently used as command or method e.g. edit, enable, paste, shift, help, enter, drag, drop etc. Data Analysis and Findings Question 2: What are high-frequency technical words in ICT specialized texts? Similarly to the method in Question 1: 2.1 Build word frequency list of the whole EICT Corpus. 2.2 Exclude all function words and academic words in finding 1. 2.3 Lemmatize the remaining words, resulting in 938 headwords. 2.4 Keep only words with ≥ 5 occurrences and technical meanings. 2.5 Compare the resulting wordlist with BNC wordlist, using Log Likelihood at the p value 0.000001. Data Analysis and Findings Finding 2 From 938 words, 358 words/acronyms with ≥ 5 occurrences, positive keyness, and significance difference are selected to be highfrequency technical words. Sort according to keyness Data Analysis and Findings Finding 2 All 358 resulting words are classified into 5 groups: 1. 106 words with particular meanings (different from general meaning) e.g. cache, cookies, bus, port, bitmap, chip, cursor, pixel etc. 2. 87 words referring to basic program, devices, command, keys e.g. spreadsheet, database, notepad, wizard, backspace etc. 3. 55 abbreviations, acronyms, and extensions e.g. ASCII, WYSIWYG, ALU, ROM, RAM, OS, RGB, ESC, ALT txt, doc, gif, wav, http, html, www etc. 4. 5. 17 words in mathematics, geometric shapes and diagram e.g. equation, ellipse, polygon, cell, column, intersection etc. 92 sub-technical terms and frequent words in ICT e.g. alignment, compression, directory, multimedia, playlist etc. Data Analysis and Findings Question 3: What are high-frequency technical collocations in ICT specialized texts? 3.1 Set the software:  To produce concordances.  To display 2-5 word clusters with ≥ 5 co-occurrences  To compute the strength of relation between words, using Mutual Information (MI) ≥ 5.000 Data Analysis and Findings 3.2 On the cluster tab, select only the 2-5 clusters with technical meaning and frequent uses. Data Analysis and Findings 3.3 Compute the relation value, on the collocate tab. Sort according to the relation value Data Analysis and Findings Finding 3 3.4 Select only the collocations with ≥ 5 occurrences, MI scores ≥ 5.000, and distribution in ≥ 3 text files. 335 collocates were selected as technical collocations => noun phrases with technical meanings e.g. mail merge operating system (OS) uniform resource location (URL) hypertext markup language (html) random access memory (RAM) wide area network (WAN) etc. Discussion Significance of the study:  Provide an overall idea about language description of English for ICT.  Provide a clear goal of language learning for serving particular learning needs. In materials design, teacher knows which language items should be focused on in designing lessons and which ones are already known by the students. Apart from typical teaching materials, a corpus itself can also be a great source of learning. It makes possible for students’ direct access to the corpus, which can promote data-driven learning. References Bowker, L. and Pearson, J. (2002). Working with Specialized Language: A Practical Guide to Using Corpora. USA and UK: Routledge. Chen, Q., & Ge, G. (2007). A corpus-based lexical study on frequency and distribution of Coxhead’s AWL word families in medical research articles (RAs). English for Specific Purposes, 26, 502-514. Elsevier Science. Cobb, T. and Horst, M. (2001). Reading academic English: Carrying learners across the lexical threshold. In Flowerdew, J. and Peacock, M., (eds.) Research Perspectives on English for Academic Purposes. pp. 315-329. UK: Cambridge University Press. Coxhead, A. (1998). An Academic Word List. ELI occasional publication. No.18. Victory University of Wellington, New Zealand. Coxhead, A. and Nation, P. (2001). The specialized vocabulary of English for academic purposes. In Flowerdew, J. and Peacock, M. (ed.) Research Perspectives on English for Academic Purposes. pp. 252-267. UK: Cambridge University Press. Hunston, S. 2002. Corpora in Applied Linguistics. Cambridge: Cambridge University Press. Laufer, B. 1989. What percentage of text-lexis is essential for comprehension? Cited in Cobb, T., and Horst, M. Reading academic English: Carrying learners across the lexical threshold. In Flowerdew, J., and Peacock, M., (eds.) Research perspectives on English for academic purposes, pp. 315-329. UK : Cambridge University Press, 2001. References Mudraya, O. (2006). Engineering English: A lexical frequency instructional models. English for Specific Purposes. Volume 25 (2) pp.235-256. Elsevier Science. Nation, P. (2001). Learning Vocabulary in Another Language. Cambridge: Cambridge University Press. Nation, P. and Waring, R. (1997). Vocabulary size, text coverage and word lists. In Schmitt, N. and McCarthy, M. (eds.) Vocabulary: Description, Acquisition and Pedagogy. pp. 6-19. Cambridge: Cambridge University Press. Nurweni, A. and Read, J. (1999). The English vocabulary knowledge of Indonesian university students.English for Specific Purposes. Volume 18 (2) pp. 161 – 175. Elsevier Science. Scott, M. (2001). Comparing corpora and identifying key words, collocations, frequency distributions through the WordSmith Tools suite of computer programs. In Ghadessy, M., Henry, A., and Roseberry, R.L. (2001). Small Corpus Studies and ELT: Theory and Practice. pp. 47-67. US: John Benjamins Publishing. Scott, M. (2007). WordSmith Tools version 5.0. Oxford University Press. Available at http://www.lexically.net/wordsmith/index.html. Supatranont, P. (2005a). Classroom concordancing: Increasing vocabulary size for academic reading. KOTESOL Proceeding 2005. pp. 35-44. South Korea. Supatranont, P. (2005b). A Comparison of the Effects of the Concordance-based and the Conventional Teaching Methods on Engineering Students’ English Vocabulary Learning. Online Ph.D. Dissertation, Program of English as an International Language, Chulalongkorn University, Thailand. Available at http://www.arts.chula.ac.th/~ling/thesis/Pisamai2548.pdf West, M. (1953). A General Service List of English Words. London: Longman, Green and Company. Thank you for your attention.

FAE ESP Materials Derived from a Web-based Corpus

Related documents

Products

Support

FAE ESP Materials Derived from a Web-based Corpus

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib