Advanced JAPE Mark A. Greenwood University of Sheffield NLP Recap • Installed and run GATE • Understand the idea of LR – Language Resources PR – Processing Resources • ANNIE Understand the goals of information extraction Loaded ANNIE into GATE Constructed one or more gazetteer lists • Created JAPE rules with simple RHS University of Sheffield NLP Overview • • • • • Simple RHS Limitations The RHS API Accessing Annotations and Features Adding New Annotations Hands-On University of Sheffield NLP Simple RHS Limitations • The simple RHS of a JAPE rule can only add simple annotations and features Feature values are hard coded or can be copied from annotations matched by the LHS • You may need more complex processing Removing temporary annotations Building complex features ... • Fortunately the RHS of a rule can consist of arbitrary Java code – the possibilities are endless! University of Sheffield NLP The RHS API • Java code provided as a RHS is used as the body of this method: public void doit(Document doc, Map bindings, AnnotationSet annotations, AnnotationSet inputAS, AnnotationSet outputAS, Ontology ontology)throws JapeException • This provides easy access to the document, rule bindings and annotations. DO NOT USE annotations IT IS DEPRECATED! University of Sheffield NLP Accessing Annotations and Features • Each labelled section of the LHS results in an Annotation Set • These Annotation Sets can be retrieved from the bindings map AnnotationSet set = (AnnotationSet)bindings.get("labelname"); University of Sheffield NLP Accessing Annotations and Features • When writing complex JAPE you will often need to access annotation features • All features of an annotation are stored in a map FeatureMap map = annotation.getFeatures() • Each feature is accessed by name Object obj = map.get(“featurename”) University of Sheffield NLP Adding New Annotations • New annotations should always be created in the outputAS • To create an annotation you need The annotation name The start and end offset A FeatureMap instance (can be empty) outputAS.add(start,end,label,features) University of Sheffield NLP Shorthand Notation for JAVA RHS • Where a Java block refers to a single lefthand-side binding, JAPE provides a shorthand notation: Rule: RemoveDoneFlag ( {Instance.flag == "done"} ):inst --> :inst{ Annotation theInstance = (Annotation)instAnnots.iterator().next(); theInstance.getFeatures().remove("flag"); } University of Sheffield NLP Shorthand Notation for JAVA RHS • A label :<label> on a Java block creates a local variable <label>Annots within the Java block which is the AnnotationSet bound to the <label> label. • The Java code in the block is only executed if there is at least one annotation bound to the label University of Sheffield NLP Hands On: Extending the IE Example • In the previous JAPE session you wrote a rule to annotate phrases such as Whitbread shares closed up 2p at 645p. • Annotating the phrase is useful but there is lots of information which would be useful to extract as features Starting price Change in price Closing price University of Sheffield NLP Hands On: Extending the IE Example • You will need to Extract the closing price and change • assume they are always in pence so you can get the value by removing the trailing ‘p’ Get the minorType of the Lookup Calculate the starting price Create a new annotation with these values as features Your Turn! Feel Free To Refer To The User Guide And To Ask For Help University of Sheffield NLP Hands On: Extending the IE Example Phase: Shares Input: Token Organization Lookup Money Options: control = appelt Rule:ShareChange ( {Organization} ({Token})[0,3] ({Lookup.majorType=="change"}):lookup ({Token})[0,3] ({Money}):delta {Token.string == "at"} ({Money}):closing ):change --> { try { AnnotationSet change = (AnnotationSet)bindings.get("change"); Annotation delta = ((AnnotationSet)bindings.get("delta")).iterator().next(); Annotation closing = ((AnnotationSet)bindings.get("closing")).iterator().next(); boolean rise = ((AnnotationSet)bindings.get("lookup")).iterator().next().getFeatures().get("minorType").equals("Changes-up"); int deltaValue = Integer.parseInt(doc.getContent().getContent(delta.getStartNode().getOffset(),delta.getEndNode().getOffset()-1).toString()); int closingValue = Integer.parseInt(doc.getContent().getContent(closing.getStartNode().getOffset(),closing.getEndNode().getOffset()-1).toString()); int startValue = (rise ? closingValue - deltaValue : closingValue + deltaValue); FeatureMap features = Factory.newFeatureMap(); features.put("rule","ShareChange"); features.put("opening",startValue+"p"); features.put("change",deltaValue+"p"); features.put("closing", closingValue+"p"); features.put("direction", (rise ? "up" : "down")); outputAS.add(change.firstNode(),change.lastNode(),"ShareChange",features); } catch (Exception e) { // ignore this for now } }