TALEND OPEN STUDIO ADVANCED Training Material Copyright Any total or partial reproduction without the consent of the author or beneficiary, devisee or legatee is not allowed (law of 11 March 1957, par. 1 of article 40). Representation or reproduction, by any means, would be considered an infringement of copyright under articles 425 et.seq. of the Penal Code. The law of 11 March 1957, par. 2 and 3 of article 41, allows the creation of copies and reproductions exclusively for the private use of the copier and not for collective use on the one hand while on the other it allows analysts to use short quotes for purposes of illustration. V080424 Copyright © 2008 Talend. All rights reserved p2 Summary Reminders and best practices Context management Mastering complex components Error handling Performance monitoring Monitoring and automating management of log files Deployment and scheduling Exploiting the power of programming languages Routines Debugging Creating components tCiviliteGenerator (no properties view/retrieval of an exercise) tPrintln (properties view) tDecryptRow or tChangeDevise or tAddColumnTotal tPrintOutput to redirect to printer Virtual component (combining tFileInput and tSortRow) V080424 Copyright © 2008 Talend. All rights reserved p3 Let's carry on! Reminders Job designs Lookups Best practices Reminders and best practices V080424 Copyright © 2008 Talend. All rights reserved p4 Job Designer: reminders A Job: components that are linked to one another Job Sub-Job V080424 Copyright © 2008 Talend. All rights reserved p5 Job Designer: tMap and Lookup Main and Lookup links set the order of processing CUSTOMER WITH STATES V080424 Copyright © 2008 Talend. All rights reserved p6 Job Designer: best practices V080424 Copyright © 2008 Talend. All rights reserved p7 Job Designer: order of components Order of execution of components End Starting Starting End Starting End V080424 End Starting Starting End Copyright © 2008 Talend. All rights reserved p8 Let's carry on! Reminders The Context node in the Repository Dedicated components Management of Contexts V080424 Copyright © 2008 Talend. All rights reserved p9 Context management A context contains several types (dev/prod) The prompt functionality refers to different types of variables (pathDir, pathFile) V080424 Copyright © 2008 Talend. All rights reserved p10 Selecting the execution context When Talend Open Studio starts: During deployment: V080424 Copyright © 2008 Talend. All rights reserved p11 Advice F5: Declare a variable The component tRunJob V080424 Copyright © 2008 Talend. All rights reserved p12 Practice area Exercise 1 and 2 V080424 Copyright © 2008 Talend. All rights reserved p13 Let's carry on! Using regular expressions: tFileRegex tIntervalMatch / tDenormalize XML input, validation & XSL Using Webservices Mastering complex components V080424 Copyright © 2008 Talend. All rights reserved p14 Metadata file regexp An example of reading an Apache log file V080424 Copyright © 2008 Talend. All rights reserved p15 tFileInputRegexp Define a regexp, example of reading from Apache log Aug 18 06:31:29 cplemon02.d075.cp logger: 66.102.9.104 - - [18/Aug/2006:07:49:27 +0200] "GET /server-status/ HTTP/1.1" 200 12891 "-" "CactiScript/1.0" "-" localhost Aug 18 06:31:30 cplemon02.d075.cp logger: 66.102.9.104 - - [18/Aug/2006:07:49:28 +0200] "GET /portail/accueil.pl HTTP/1.1" 200 2907 "-" "CactiScript/1.0" "-" localhost Aug 18 06:31:32 cplemon01.d075.cp logger: 66.102.9.104 - - [18/Aug/2006:08:06:48 +0200] "GET /server-status/ HTTP/1.1" 200 56765 "-" "CactiScript/1.0" "-" localhost V080424 Copyright © 2008 Talend. All rights reserved p16 tIntervalMatch V080424 Copyright © 2008 Talend. All rights reserved p17 tNormalize V080424 Copyright © 2008 Talend. All rights reserved p18 tDenormalize 3 2 V080424 Copyright © 2008 Talend. All rights reserved p19 tXSDValidator V080424 Copyright © 2008 Talend. All rights reserved p20 tXSLT: generate documents V080424 Copyright © 2008 Talend. All rights reserved p21 tFileOutputXML: 1 special case V080424 Copyright © 2008 Talend. All rights reserved p22 tAdvancedFileOutputXML V080424 Copyright © 2008 Talend. All rights reserved p23 tWebServiceInput V080424 Copyright © 2008 Talend. All rights reserved p24 Let's carry on! Trigger ifOk and ifError Components tLogCatcher / tWarn / tDie Create specific logs Error management V080424 Copyright © 2008 Talend. All rights reserved p25 Error handling Each component has its own error handling routine (OnComponentError) V080424 Copyright © 2008 Talend. All rights reserved p26 tLogCatcher schema Default schema V080424 Copyright © 2008 Talend. All rights reserved p27 Practice area Exercise 3: Handling errors V080424 Copyright © 2008 Talend. All rights reserved p28 Practice area Exercise 4: Customizing error logs V080424 Copyright © 2008 Talend. All rights reserved p29 Let's carry on! tStatCatcher tFlowMeter Performance monitoring V080424 Copyright © 2008 Talend. All rights reserved p30 tStatCatcher Monitoring the performance of each component V080424 Copyright © 2008 Talend. All rights reserved p31 Let's carry on! Automating the management of logs Using log files Monitoring and automation of log file management V080424 Copyright © 2008 Talend. All rights reserved p32 Log management 3 types of log Manual log management and error management (Java or Perl errors) V080424 Management of component execution start, end and duration Copyright © 2008 Talend. All rights reserved Management of metrics p33 Management of logs/preferences In order to collate logs, configure preferences: V080424 Copyright © 2008 Talend. All rights reserved p34 Management of logs/preferences In the Properties and Job Designs view, preferences are entered automatically: This is in Built-in mode, which is not great for maintaining preferences! It is better to create metadata in the Repository and specify it in each job. V080424 Copyright © 2008 Talend. All rights reserved p35 Practice area Exercise 5: Creating the tables that are needed V080424 Copyright © 2008 Talend. All rights reserved p36 Practice area Exercise 6 V080424 Copyright © 2008 Talend. All rights reserved p37 Activity Monitoring Console /PE Centralize the monitoring of Talend jobs Single user version Harvesting of all execution server reports Functionalities Monitoring of: Events triggered by the jobs Execution time Volumes of processed data Harvest the local and remote server data simultaneously User-definable interface Integrated in Talend Open Studio or independent Benefits Single console for all integration jobs Custom indicators and thresholds Global view V080424 Copyright © 2008 Talend. All rights reserved p38 Let's carry on! tGroovy, tJava* & tPerl* Create specific classes/subs Use jars/ external Perl modules Exploiting the power of programming languages V080424 Copyright © 2008 Talend. All rights reserved p39 GroovyFile component GroovyFile: a simplified syntax The script is simply entered in a text file def name='World'; println "Hello $name"; V080424 Copyright © 2008 Talend. All rights reserved p40 tPerlFlex/tJavaFlex component Component execution in 3 stages Component tPerlFlex/tJavaFlex: a component in 3 parts Starting loop Start Main End End V080424 Copyright © 2008 Talend. All rights reserved p41 Practice area Exercise 7 Use tPerlFlex/tJavaFlex to generate the following flow: key value 0 Miss 1 Mrs. 2 Mr. V080424 Copyright © 2008 Talend. All rights reserved p42 tJava / tPerl components tJava / tPerl = Start section of a tPerlFlex/tJavaFlex Start Main End V080424 Copyright © 2008 Talend. All rights reserved p43 Practice area Exercise 8: Create a table and use it in tMap V080424 Copyright © 2008 Talend. All rights reserved p44 tJavaRow / tPerlRow components tJava / tPerl = Main section of a tJavaFlex / tPerlFlex Start Main End V080424 Copyright © 2008 Talend. All rights reserved p45 Practice area Exercise 9: Use a tPerlRow/tJavaRow to modify a flow: V080424 INPUT OUTPUT 0 Miss 1 Mrs. 2 Mr. Copyright © 2008 Talend. All rights reserved p46 Practice area Exercise 10: Use tPerlRow/tJavaRow to modify the flow: INPUT NSS V080424 OUTPUT month year depart ment 2700392000000 03 1970 92 1760991000000 09 1976 91 Copyright © 2008 Talend. All rights reserved p47 Let's carry on! Master the complex components: tGroovy, tJava & tPerl Create specific classes/subs Use jars/ external Perl modules Exploiting the power of programming languages V080424 Copyright © 2008 Talend. All rights reserved p48 Editor with embedded Java/Perl Integrated Java / Perl editor Auto-complete Syntax-based colours Explanation of errors Javadoc / Perldoc More information about the Java editor http://jmdoudoux.developpez.com/java/eclipse/?page=Chap_006#L6.4 V080424 Copyright © 2008 Talend. All rights reserved p49 Managing shared code: routines Structure your classes and subs Group together existing: Business classes Connectors V080424 Copyright © 2008 Talend. All rights reserved p50 Creating a class or a sub Example: Java Class The comments allow you to make the method available in Expression Builder and tRowGenerator V080424 Copyright © 2008 Talend. All rights reserved p51 Practice area Exercise 11: Create that allows the following to be extracted from an NSS: Month Year Department INPUT OUTPUT NSS V080424 month year dept 2700392000000 03 1970 92 1760991000000 09 1976 91 Copyright © 2008 Talend. All rights reserved p52 Let's carry on! Mastering complex components: tGroovy, tJava & tPerl Create specific classes/subs Use jars/ external Perl modules Exploiting the power of programming languages V080424 Copyright © 2008 Talend. All rights reserved p53 Use a jar file or an external module The external jar files are declared in a routine V080424 Copyright © 2008 Talend. All rights reserved p54 Practice area Exercise 12: Import the jakarta-oro-2.0.8.jar package Create a routine isValideEmail() V080424 public static boolean isValideEmail(String email) { Perl5Matcher matcher = new Perl5Matcher(); Perl5Compiler compiler = new Perl5Compiler(); Pattern pattern; try { pattern = compiler.compile("^[\\w_.-]+@[\\w_.-]+\\.[\\w]+$"); if (!matcher.matches(email, pattern)) { return false; } } catch (MalformedPatternException e) { throw new RuntimeException(e); } return true; Copyright © 2008 Talend. All rights reserved } p55 Practice area Exercise 13: Create a job and name it UseJar Use the function IsValideEmail() V080424 Copyright © 2008 Talend. All rights reserved p56 Let's carry on! Advanced debugging mode Define Breakpoints Learn step-by-step mode (step in, step into, step over) View variables in real time “Hot fix” variables in memory Implementing jobs V080424 Copyright © 2008 Talend. All rights reserved p57 Debug view "Step-by-step" mode Hot fix variables JasperETL On-line resources http://www.jmdoudoux.fr/java/dejae/chap008.htm V080424 Copyright © 2008 Talend. All rights reserved p58 Practice area Exercise 14: Using the debugger V080424 Copyright © 2008 Talend. All rights reserved p59 Let's carry on! Deploy your jobs to a production system Export your jobs Executable Web services Launch your jobs from the command line Plan execution of your jobs via a scheduler Implementing jobs V080424 Copyright © 2008 Talend. All rights reserved p60 Deployment and optimisation Deploy jobs Retrieve the generated code Conditions under which they function V080424 Copyright © 2008 Talend. All rights reserved p61 Deploy a web service V080424 Copyright © 2008 Talend. All rights reserved p62 Talend products and deployment • Hosted Repository Deploy your jobs remotely Activity Monitoring Console /Personal Edition Subscription Talend on Demand Shares Repository Job Conductor Activity Monitoring Console / Dashboard Distant Run Grid Conductor CPU Balancer Open Source • • • • • • Enterprise Talend Integration Suite Talend Open Studio V080424 Copyright © 2008 Talend. All rights reserved GPL Individual • Business Modeler • Job Designer • Metadata Manager p63 Let's carry on! Create your own components: design and implementation Understand the concepts of code generation via a template Understand the 3 template files: start/main/end Understand the XML description of components Internationalize your components Implementing jobs V080424 Copyright © 2008 Talend. All rights reserved p64 Storage Code Generator Generated Program V080424 ... XML XML Business Model Job Jet Template Jet Template Jet Template Perl Java C Perl Program Java Program C Program Copyright © 2008 Talend. All rights reserved Preliminary skills: Perl 5.8 Talend Open Studio GUI Preliminary skills: JAVA 1.5 Architecture of Talend Open Studio p65 Execution of components Order in which components are executed End Starting Starting End Starting End V080424 End Starting Starting End Copyright © 2008 Talend. All rights reserved p66 Design a specific component tJavaFlex / tPerlFlex : Start: action triggered by calling the component Main: action triggered with each line of data End: action triggered at the end of processing V080424 Copyright © 2008 Talend. All rights reserved p67 Module containing components The module org.talend.designer.components.localprovider Do not modify this file by hand! V080424 Copyright © 2008 Talend. All rights reserved p68 User components Store your components in a specific file V080424 Copyright © 2008 Talend. All rights reserved p69 Component’s Designer Your components currently in development V080424 Copyright © 2008 Talend. All rights reserved p70 Best practices Differentiate between your development and production environments! Install an instance of TOS to develop your components Install an instance of TOS or of TIS Client to manage your Talend projects Simply separating the storage folders for production and development components is not sufficient. V080424 Copyright © 2008 Talend. All rights reserved p71 Component files <component>_<language>.xml Description of a component <component>_icon32.png The Palette icon <component>_messages.properties The contents of the properties view <component>_begin.<language>jet The Start part of the code <component>_main.<language>jet The Main part of the code (the loop) <component>_end.<language>jet The End part of the code V080424 Copyright © 2008 Talend. All rights reserved p72 Working with files The Component Designer view V080424 Copyright © 2008 Talend. All rights reserved p73 <component>_<language>.xml <component>_<language>.xml, description of component The Header tag presents the attributes AUTHOR, VERSION, COMPATIBILITY STARTABLE SCHEMA_AUTO_PROPAGATE DATA_AUTO_PROPAGATE The CONNECTORS tag contains: <CONNECTOR CTYPE="FLOW" MAX_INPUT="0"/> <CONNECTOR CTYPE="ITERATE" MAX_OUTPUT="1" MAX_INPUT="1"/> <CONNECTOR CTYPE="THEN_RUN" MAX_INPUT="1"/> <CONNECTOR CTYPE="RUN_OK"/> <CONNECTOR CTYPE="RUN_ERROR"/> <CONNECTOR CTYPE="RUN_IF"/> <CONNECTOR NAME="UNIQUE" CTYPE="FLOW" COLOR="086438" BASE_SCHEMA="FLOW" /> <CONNECTOR NAME="DUPLICATE" CTYPE="FLOW" LINE_STYLE="2" COLOR="f36300" BASE_SCHEMA="FLOW" /> PARAMETERS...following page ;) http://www.talendforge.org/wiki/doku.php?id=component_creation#xml_description V080424 Copyright © 2008 Talend. All rights reserved p74 <component>_<language>.xml <component>_<language>.xml, description of component: parameters <PARAMETER NAME="FILENAME" FIELD="FILE" NUM_ROW="2" REQUIRED="true" > <DEFAULT>'C:\talend_files\in.csv'</DEFAULT> </PARAMETER> The different forms of FIELD •CHECK •CLOSED_LIST •DIRECTORY •FILE •MEMO •MEMO_PERL •MEMO_JAVA •MEMO_SQL •PROCESS_TYPE •PROPERTY_TYPE •SCHEMA_TYPE •TABLE •TEXT http://www.talendforge.org/wiki/doku.php?id=component_creation#xml_description V080424 Copyright © 2008 Talend. All rights reserved p75 Practice area Exercise 15: First component! Objective is to understand the description xml file V080424 Copyright © 2008 Talend. All rights reserved p76 Internationalize these components CompoName_messages_zh.properties V080424 Copyright © 2008 Talend. All rights reserved p77 JET Templates Eclipse Modeling Framework (EMF) contains a very powerful tool for generating source code: JET (Java Emitter Templates) V080424 Copyright © 2008 Talend. All rights reserved p78 JET Templates: general information JET Template is made up of several parts: The code to be generated The functional code allowing you to determine the parameters of the code to be generated <% /*Java code which will not be included in the generated code otherwise called the functional code */ %> /* Jave Code constituting the generated code */ V080424 Copyright © 2008 Talend. All rights reserved p79 Practice area Exercise 16: First JET Templates! V080424 Copyright © 2008 Talend. All rights reserved p80 JET Templates / Start :: imports JET Templates header <%@ jet imports=" org.talend.core.model.process.INode org.talend.core.model.process.ElementParameterParser org.talend.core.model.metadata.IMetadataTable org.talend.core.model.metadata.IMetadataColumn org.talend.designer.codegen.config.CodeGeneratorArgu ment java.util.List " %> Then provides information according to the functionality of components V080424 Copyright © 2008 Talend. All rights reserved p81 JET Templates / tPrint The tPrint component: <%@ jet imports=" org.talend.core.model.process.INode org.talend.core.model.process.ElementParameterParser org.talend.designer.codegen.config.CodeGeneratorArgument " %> <% CodeGeneratorArgument codeGenArgument = (CodeGeneratorArgument) argument; INode node = (INode)codeGenArgument.getArgument(); String msg = ElementParameterParser.getValue(node, "__COMMENTAIRE__"); %> String msg = <%=msg %>; if(msg.equals("")) { System.out.println("Message is empty"); } else { System.out.println(msg); } V080424 Copyright © 2008 Talend. All rights reserved p82 JET Templates/tPrint/generated code Generated code String msg = "nbrLignes: " + ((Integer)globalMap.get("tLogRow_1_NB_LINE")) ; if(msg.equals("")) { System.out.println("Message is empty"); } else { System.out.println(msg); } V080424 Copyright © 2008 Talend. All rights reserved p83 Practice area Exercise 17: Create the component tPrint A single parameter: Comments Use it in a job! V080424 Copyright © 2008 Talend. All rights reserved p84 JET Templates/error management Native handling: when a Java exception occurs The OnError link is triggered instead of the OnOk link The variable compoName_ERROR_MESSAGE is instantiated The error message is sent to tLogCatcher Manual handling: compoName_ERROR_MESSAGE must be instantiated throw new Exception("myErrorMessage"); System.err.println("myErrorMessage"); V080424 Copyright © 2008 Talend. All rights reserved p85 JET Templates / tPrint Handling an error in the tPrint component: <%@ jet imports=" org.talend.core.model.process.INode org.talend.core.model.process.ElementParameterParser org.talend.designer.codegen.config.CodeGeneratorArgument " %> <% CodeGeneratorArgument codeGenArgument = (CodeGeneratorArgument) argument; INode node = (INode)codeGenArgument.getArgument(); String msg = ElementParameterParser.getValue(node, "__MSG2PRINT__"); String cid = node.getUniqueName(); %> String msg = <%=msg %>; if(msg.equals("")) { String errMsg = "<%=cid %>: message is empty"; globalMap.put("<%=cid %>_ERROR_MESSAGE", errMsg); System.err.println(errMsg); throw new Exception(errMsg); } else { System.out.println(msg); } V080424 Copyright © 2008 Talend. All rights reserved p86 Practice area Exercise 18: Manage errors from within a component Create the component tPrint Display an alert in the case of an empty message Trigger the OnError link V080424 Copyright © 2008 Talend. All rights reserved p87 JET Templates Eclipse Modeling Framework (EMF) contains a very powerful tool for generating source code: JET (Java Emitter Templates) In Talend, the JET Templates are split into 3 parts: Start Main End V080424 Copyright © 2008 Talend. All rights reserved p88 tInputCivilite V080424 Copyright © 2008 Talend. All rights reserved p89 Define a schema by default <PARAMETERS> <PARAMETER NAME="SCHEMA" FIELD="SCHEMA_TYPE" REQUIRED="true" NUM_ROW="1" READONLY="true"> <TABLE> <COLUMN NAME="id" TYPE="id_Integer" LENGTH="1"/> <COLUMN NAME="value" TYPE="id_String" LENGTH="30"/> </TABLE> </PARAMETER> </PARAMETERS> V080424 Copyright © 2008 Talend. All rights reserved p90 Practice area Exercise 19: Create the tInputCivilite (or tInputtitle) component Configure the xml file and properties Create the 3 JET Templates V080424 Copyright © 2008 Talend. All rights reserved p91 Practice area Exercise 20: Manipulate multiple flows Modify the xml file in order to allow multiple outputs Modify the main Jet Template V080424 Copyright © 2008 Talend. All rights reserved p92 Mode Row: Propagation of the schema Let's pass to the Row components V080424 Copyright © 2008 Talend. All rights reserved p93 Propagation of the schema/ the code V080424 Copyright © 2008 Talend. All rights reserved p94 Practice area Exercise 21: Create a tDemoRow component V080424 Copyright © 2008 Talend. All rights reserved p95 tInt2StringCiviliteRow V080424 Copyright © 2008 Talend. All rights reserved p96 Practice area Exercise 22: Produce the tInt2StringCivilite component Begin: initialize the String [] valueArray table Main: loop on the columns and determine the content of the column __CIVILITEOUT__ V080424 Copyright © 2008 Talend. All rights reserved p97 tReplaceCiviliteRow <PARAMETERS> <PARAMETER NAME="CIVILITEOUT" FIELD="TEXT" REQUIRED="true" NUM_ROW="1" NB_LINES="1"> <DEFAULT>"newColumn"</DEFAULT> </PARAMETER> </PARAMETERS> V080424 Copyright © 2008 Talend. All rights reserved p98 Practice area Exercise 23: Produce the tRowCivility component V080424 Copyright © 2008 Talend. All rights reserved p99 Reject Case where the key is superior to 2 V080424 Copyright © 2008 Talend. All rights reserved p100 Practice area Exercise 24: Manage a reject flow V080424 Copyright © 2008 Talend. All rights reserved p101 Load a jar into a component V080424 Copyright © 2008 Talend. All rights reserved p102 Practice area Exercise 25: Import a Jar into a component V080424 Copyright © 2008 Talend. All rights reserved p103 Improve the created components The Java editor allows you to identify unused imports V080424 Copyright © 2008 Talend. All rights reserved p104 Virtual component Certain components can use other components in a transparent manner <CODEGENERATION> <TEMPLATES INPUT="AGGOUT" OUTPUT="AI"> <TEMPLATE NAME="AGGOUT" COMPONENT="tAggregateOut"> <LINK_TO NAME="AI" CTYPE="THEN_RUN"/> </TEMPLATE> <TEMPLATE NAME="AI" COMPONENT="tArrayIn"/> <TEMPLATE_PARAM SOURCE="self.OPERATIONS" <TEMPLATE_PARAM SOURCE="self.OPERATIONS" <TEMPLATE_PARAM SOURCE="self.GROUPBYS" TARGET="AGGOUT.GROUPBYS"/> <TEMPLATE_PARAM SOURCE="self.SCHEMA" TARGET="AGGOUT.SCHEMA"/> <TEMPLATE_PARAM SOURCE="self.SCHEMA " TARGET="AI.SCHEMA"/> <TEMPLATE_PARAM SOURCE="self.UNIQUE_NAME" TARGET="AGGOUT.DESTINATION"/> <TEMPLATE_PARAM SOURCE="self.UNIQUE_NAME " TARGET="AI.ORIGIN" /> </TEMPLATES> </CODEGENERATION> V080424 Copyright © 2008 Talend. All rights reserved p105 Component and graphic interface The use of the graphic user interface by a component requires the creation of a dedicated plugin V080424 Copyright © 2008 Talend. All rights reserved p106 Rely on the Talend community Exchange with the community: Forum Ecosystem V080424 Your tools: Wiki BugTracker Copyright © 2008 Talend. All rights reserved p107 Congratulations! Your turn to play! Contact : training@talend.com V080424 Copyright © 2008 Talend. All rights reserved p108