Using Web-based Speech Recognition Technologies to Improve English Pronunciation Howard Chen 陳浩然 hjchen@ntnu.edu.tw English Department 師大英語系 National Taiwan Normal University Better Listening and Speaking Skills: GEPT and new TOEFL Writing Section: There will be one 30-minute and one 20-minute essay. Speaking Section: Six open-ended speaking questions require test takers to speak into a microphone. Integrated Language: Some sections of the new test will combine four basic communication skills. For example, a test taker might listen to a lecture and read a passage, then write or speak about it. Score: 1-120 (human graders for essay and speaking section). Time: 3.5 hours Different Solutions Some universities hire more native English speakers and provide student with more opportunities to interact with these teachers. Some colleges have reduced the class size and expect that there will be more teacherstudent interactions in the target language. Some universities also begin to explore the power of new computer technologiespronunciation tutor and ASR. The Development of ASR in CALL A few years ago, very few CALL (computer assisted language learning) programs claimed that they incorporated new speech recognition technologies. ‘ But within the past 5 years, automatic speech recognition (ASR) technologies are widely used in various language teaching and learning programs. Some Programs Available in Taiwan ASR Provides the Following Benefits: Students will have more opportunities to produce the target language and have extensive interaction with computers. Students will have individual attention, and they do not need to compete with other classmates. Students can learn to communicate under a less threatening environment and they often can get feedback from computers quickly. Students are provided with various direct or indirect feedback from computers. Students can hear the models provided by different native speakers. Student can have better control about their learning pace and might have enhanced their confidence. TRACI Talk TeLL me More pro The Scores Provided by MyET Program-via web Prices Are Too High and Limited Access Although these PC-based speech recognition software programs are quite useful, many students cannot afford to buy these products. (The price of Traci Talk or TeLLmeMore will be around 100 US dollars). Some schools have purchased these software programs but students can only use these software programs in language center or language labs and they can not have access to these programs from dormitory or home. Microsoft Speech Applications Developmental Kit The new Microsoft speech applications developmental kit allows programmers and researcher to develop different web-based speech applications. The speech development kit requires a new programming environment based on the integration of the following software programs. Microsoft .NET Speech SDK Microsoft Internet Information Services (IIS). Microsoft Internet Explorer 6.0 or later version. Microsoft Visual Studio .NET Professional Microsoft .NET Framework The Flowchart of Interaction NTNU Pronunciation Practice Web Site NTNU Dept of English: ASR Technologies Repeat Conversation Choose the Right Answer Write and Say Flash and Speech Recognition Personal Learning Records Checking Students’ Performances Learners’ Evaluation The online system was completed after extensive tests around January, 2005. We then invited 25 students (non-English majors) who were taking Freshman English course to use this web site in the spring semester. All of these college students were graduated from vocational high schools and most of them have difficulties in English pronunciations. 74% of these students felt that their pronunciations were poor in a survey. Users Survey Other Useful Information most students indicated that they chose to try 3-5 times before they gave up trying. If they find they were not sure about the pronunciations of a certain word, many students (65%) would choose to listen to the audios provided by the system. It is also interesting to note that students (61%) find that they can pass the shorter sentences more easily. They also found that the “identify the object” is the easiest exercise (39%), followed by “listen and repeat”(35%) and “choose the appropriate response”(26%) Some Positive Comments Suggestions for Improvement Reflection 1 The user interface can be made more interesting and attractive. Some of the sections with TTS (text to speech) sounds can be replaced with human voices. The hardware of the web site server can be enhanced to provide better and reliable performances. If the learning environment can be changed to game-like environment, that would be more attractive for users. We are currently developing a 2-D learning environment and use the Flash animation to make this site more interesting. As for the problematic items, there seems be some bugs for certain items, we would need to find out the solutions these items. To identify and correct these items quickly will help students to reduce their frustrations in interacting with this site. Reflection 2: Feedback Quality- Lower Threshold? In addition, the system so far can only give a pass or fail judgment. For students at the lower proficiency level, sometimes they might find the system very demanding about their pronunciations. Furthermore, the system cannot clearly pinpoint the problems of individual speakers. The students can only listen to the models carefully and try again and again. Perhaps we can figure out a better way of make the speech recognition system less demanding (e.g. lowering the sensitivity of speech recognition engine); however, we will then face the problem of setting the reasonable threshold. How to Use the System in English Learning? Diagnosis: A screening device One possibility of utilizing this ASR system is to use the system to assess large numbers of students’ pronunciation abilities. This system might be able to help to quickly identify students who are relatively weak in pronunciation. Then teachers and tutors might provide extra help. Motivating learners to listen and speak. A tool for oral practice. (with the animation and other multimedia support )- Practice with any kind of sentences and phrases. (not confined to some fixed patterns). Thank you! Questions and Feedback Welcome!