Expression-level issues, IT Hons and PhD students © Learning Centre, 2008 EXPRESSION-LEVEL ISSUES WORKSHOP INFO4990 for the Faculty of Engineering and IT Areas that this session will address: 1. Paragraph structure 2. Academic language: formality 3. Grammatical accuracy 1. Paragraph structure What is a paragraph? A paragraph is a section of text focussing on one idea (or two linked ideas) usually has o an introductory focus statement (‘topic sentence’) o a development section, where, e.g.: an argument element is logically elaborated, or an aspect of the argument is followed by examples from the literature, or an aspect of the topic area is criticised with support from the literature o then sometimes a concluding section All examples below are taken from: Muslea, I., Minton, S. & Knoblock, C. A. (2006). Active Learning with Multiple Views. Journal of Artificial Intelligence Research, 27, 203-233. An example paragraph (evaluative words/expressions are underlined, and logical linkers bolded): Our empirical results show that Co-Testing is a powerful approach to active learning. Our experiments use four extremely different base learners (i.e., Main point stalker, ib, Naïve Bayes, and mc4) on four different types of domains: Development: wrapper induction, text classification (courses), ad removal (ad), and Details discourse tree parsing (tf). In all these scenarios, Co-Testing clearly outperforms the single-view, state of the art active learning algorithms. Writer’s evaluation 1 Furthermore, except for Query-by-Bagging, Co-Testing is the only algorithm that can be applied to all the problems considered in the empirical evaluation. Evaluation 2 In contrast to Query-by-Bagging, which has a poor performance on courses and wrapper induction, Co-Testing obtains the highest accuracy among the Evaluation 3: contrast considered algorithms. with another approach Exercise 1: With your immediate neighbours, read the following text and decide where you would divide it into paragraphs and why you would make the division(s) where you have: 1 Expression-level issues, IT Hons and PhD students © Learning Centre, 2008 Co-Testing's success is due to its ability to discover the mistakes made in each view. As each contention point represents a mistake (i.e., an erroneous prediction) in at least one of the views, it follows that each query is extremely informative for the view that misclassifed that example; that is, mistakes are more informative than correctly labeled examples. This is particularly true for base learners such as stalker, which do not improve the current hypothesis unless they are provided with examples of misclassifed instances. As a limitation, Co-Testing can be applied only to multi-view tasks; that is, unless the user can provide two views, CoTesting cannot be used at all. However, researchers have shown that besides the four problems above, multiple views exist in a variety of real world problems, such as named entity classifcation (Collins & Singer, 1999), statistical parsing (Sarkar, 2001), speech recognition (de Sa & Ballard, 1998), word sense disambiguation (Yarowsky, 1995), or base noun phrase bracketing (Pierce & Cardie, 2001). 2. Academic language: formality Exercise 2: With your immediate neighbours, read the following text and decide if any of the language used is too informal for your Assignment 2: 1. This analysis showed that X system did the job of representing clinical concepts well. 2. I plan on keeping a close watch on this year’s X conferences. 3. I believe my research progress is on track, although I am lagging behind with the literature readings. 4. Most of the articles were harder to digest than I had expected. 3. Grammatical accuracy Exercise 3: With your immediate neighbours, read the following sentences and decide whether they are grammatically correct or not: 3.1 Active or passive? 1. Researchers have been investigated the use of inductive learning algorithms. 2. Their system was evaluated against the labelled training data. 3. In this system, the vocabularies are consist of … 3.2 Subject-verb agreement / Is there a subject? 1. In Figure 3, the main steps of the text categorisation process is given. 2. The other concerns about Co-Testing is related to the potential violations of the two multiview assumptions, which requires that the views are both uncorrelated and compatible. 3. 300 reports were randomly selected and any mistakes occurred were manually corrected. 2