Using Logstash for PI Robert Mckeown Dec 12, 2014 © 2014 IBM Corporation ETL context Use Logstash as ETL (Extract, Transform, Load) tool to transform data to required PI format Logstash 2 © 2014 IBM Corporation Supporting information • Main logstash web site • http://http://logstash.net/ • The Logstash Book • http://www.logstashbook.com/ • Logstash Foru lots of good Q&A • https://groups.google.com/forum/#!forum/logstash-users • Doug McClure's 'Logstash is your friend' doc – log-centric but has a good end-to-end example and advice. • This doc is a quick skim to get you started. To become proficient, refer the sites above! 3 © 2014 IBM Corporation Key logstash functions • Logstash is an event pipeline – Inputs → codecs/filters → outputs • Inputs generate events, codecs and filters modify them, outputs ship them • Types are set and used mainly for filter activiation. They always persist with the event • Tags can be used to specify an order for event processing (apply filter A, then filter D then filter F) as well as event routing to specific filters and outputs • Conditionals give you if, else if etc, as well as comparison tests and boolean logic for sophisticated analysis, processing and routing Chart from Doug McClure 4 © 2014 IBM Corporation Key logstash functions Pipeline myType PI files Metric files scacsv Original chart from Doug McClure 5 © 2014 IBM Corporation Standard set of plug-ins plus two PI specific ones scapivot 6 scabmcfile scacsv © 2014 IBM Corporation Installation • Download Logstash 1.4.2 from https://download.elasticsearch.org/logstash/logstash/logstash-1.4.2.tar.gz • Unpack in a dir of your choice • Add logstash to your $PATH for convenience (if desired ) • Install additional standard plug-ins aka 'contribs' – cd /path/to/your/logstash – bin/plugin install contrib • Obtain the SCAPI Plugin package • Currently avail here in the CSI Predict, Search and Event Analytics technial sales forum • Note: Logstash is already installed on the current 'standard' SoftLayer1.3 images 7 © 2014 IBM Corporation Running Logstash • Only additional item beyond standard Logstash invocation is to ensure that you reference the custom SCA plugins on the command line (if you are using them) • e.g. my – Logstash is installed at /home/rmckeown/dev/logstash-1.4.2 – Plugins installed in /home/rmckeown/dev/logstashDev • Running Logstash would be • /home/rmckeown/dev/logstash-1.4.2/bin/logstash -f myConf.conf –pluginpath /home/rmckeown/dev/logstashDev/scaLogstash • Use of $PATH can make this a bit shorter 8 © 2014 IBM Corporation Example 1 'group' name Date ok(?) 'metric' name Metric value Device number • • • 9 Skinny-format Multiple 'groups' implied No header © 2014 IBM Corporation Example 1 See http://logstash.net/docs/1.4.2/inputs/stdin http://logstash.net/docs/1.4.2/outputs/stdout Host which processed record Timestamp when message/record was processed 10 Actual record © 2014 IBM Corporation Example 1 Outputs data using ruby 'awesome_print' Outputs data as json Output formatted by jsonlint 11 © 2014 IBM Corporation Example 1 - filter The CSV filter takes an event field containing CSV data, parses it, and stores it as individual fields (can optionally specify the names). This filter can also parse data with any separator, not just commas. Create desired columns Remove arbitary fields Columns added Field 'interval' removed Note: two timestamps 12 © 2014 IBM Corporation Example 1 – first csv Desired data output but No header Data not separated by group 13 © 2014 IBM Corporation Example 1 – Conditional & CSV Example of conditional Not standard PI name No header 14 © 2014 IBM Corporation Example 1 – using scaCSV Custom operator Output files Still 'skinny'! - Need to 'pivot' 15 © 2014 IBM Corporation Example 2 – scapivot Custom operator Values (metric identities) become column names cpu 16 Metric values mapped to correct column net © 2014 IBM Corporation Example 2 • • • 17 Meta-data in header Selection of header and data lines Simple format clean up of individual feels (e.g. 'G', '%', '-') © 2014 IBM Corporation Example 2 – basic classification by type Conditional with regular expression – match Any line that starts with '20' – this will be our date Classify these as DATA_Line and for output later No tags added Tags added 18 © 2014 IBM Corporation Example 2 – capture timestamp via Grok • • Grok is one of the most important plug-ins for use with PI (see http://logstash.net/docs/1.4.2/filters/grok ) – Grok : Parse arbitrary text and structure it https://github.com/esasticsearch/logstash/tree/v1.4.2/patterns are used to convert matched strings e.g.TIMESTAMP_ISO8601 %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}? • 19 For developing Grok patterns http://grokdebug/herokuapp.com is very useful © 2014 IBM Corporation Example 2 – Grok / grokdebug 20 © 2014 IBM Corporation Example 2 – capture timestamp via Grok 21 © 2014 IBM Corporation Example 2 – cleaning up fields Convert a string field by applying a regular expression and replacement. Here we are replacing - or % with “” 22 © 2014 IBM Corporation Example 2 – splitting in to LogStash 'CSV' Aligns with input file Reformatting timestamp Watch this spot ! 23 © 2014 IBM Corporation Example 2 – Outputting Subset of fields Still need to determine this 24 © 2014 IBM Corporation Example 2 – Determining Server Name (associative behavior) • • • • • Events are generally **independent** – 'Multiline Events' are an exception Cannot obviously carry information from is available across events In our NAB example, the server identity is in separate 'event' in the header. Processing information 'across' events is more challenging Think outside the box (or outside single instances of Logstash) – Two-step approach. May be others Of course, it doesn't have to be logstash either Logstash Will use as replacement ServerName : serverX serverMap Logstash main processing Final output Original file 25 © 2014 IBM Corporation Example 2 – Determining Server Name 26 © 2014 IBM Corporation Example 2 – Replacing server name (translate) 27 © 2014 IBM Corporation Extending Logstash aka Building custom plug-ins • • Plug-ins are written primarily in Ruby Can call out to Java easily (since Logstash runs on jRuby ) • • Chapter 8 of The Logstash Book – 'Extending Logstash' has all the details Also, look at the source code for existing plug-ins for lots of good examples on how to proceed 28 © 2014 IBM Corporation Location of plug-ins • Can also specify a directory outside Logstash installation and work out of that – mkdir -p /etc/logstash/{inputs,filters,outputs} – Specify this path when running logstash e.g. – ..../logstash/bin/logstash –pluginpath /etc/ ......... 29 © 2014 IBM Corporation Extending Logstash aka Building custom plug-ins – scaJDBC (new plug-in & Java interaction) Plug in name New config options Standard CSV 30 © 2014 IBM Corporation Extending Logstash aka Building custom plug-ins - scaJDBC Inherit from Base Plug-in name Config Register at runtime 31 © 2014 IBM Corporation Almost Java! How many columns? Create a brand new event Assign attribute/values for each data item returned from DB Finalize and dispatch! 32 © 2014 IBM Corporation