SAP HANA smart data streaming 1.0 June 2015 English SAP Smart Data Streaming Deployment for Stream Intelligence (EB9) Building Block Configuration Guide SAP SE Dietmar-Hopp-Allee 16 69190 Walldorf Germany SAP Best Practices SAP SDS Deployment for Stream Intelligence (EB9): Configuration Guide Copyright © 2014 SAP SE or an SAP affiliate company. All rights reserved. No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP SE or an SAP affiliate company. SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP SE (or an SAP affiliate company) in Germany and other countries. Please see http://global.sap.com/corporate-en/legal/copyright/index.epx#trademark for additional trademark information and notices. Some software products marketed by SAP SE and its distributors contain proprietary software components of other software vendors. National product specifications may vary. These materials are provided by SAP SE or an SAP affiliate company for informational purposes only, without representation or warranty of any kind, and SAP SE or its affiliated companies shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP SE or SAP affiliate company products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty. In particular, SAP SE or its affiliated companies have no obligation to pursue any course of business outlined in this document or any related presentation, or to develop or release any functionality mentioned therein. This document, or any related presentation, and SAP SE’s or its affiliated companies’ strategy and possible future developments, products, and/or platform directions and functionality are all subject to change and may be changed by SAP SE or its affiliated companies at any time for any reason without notice. The information in this document is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. All forward-looking statements are subject to various risks and uncertainties that could cause actual results to differ materially from expectations. Readers are cautioned not to place undue reliance on these forward-looking statements, which speak only as of their dates, and they should not be relied upon in making purchasing decisions. © SAP SE Page 2 of 26 SAP Best Practices SAP SDS Deployment for Stream Intelligence (EB9): Configuration Guide Revision Change Date Description 0 June 6, 2015 Initial © SAP SE Page 3 of 26 SAP Best Practices SAP SDS Deployment for Stream Intelligence (EB9): Configuration Guide Icons Icon Meaning Caution Example Note Recommendation Syntax Typographic Conventions Type Style Description Example text Words or characters that appear on the screen. These include field names, screen titles, pushbuttons as well as menu names, paths and options. Cross-references to other documentation. Example text Emphasized words or phrases in body text, titles of graphics and tables. EXAMPLE TEXT Names of elements in the system. These include report names, program names, transaction codes, table names, and individual key words of a programming language, when surrounded by body text, for example, SELECT and INCLUDE. Example text Screen output. This includes file and directory names and their paths, messages, source code, names of variables and parameters as well as names of installation, upgrade and database tools. EXAMPLE TEXT Keys on the keyboard, for example, function keys (such as F2) or the ENTER key. Example text Exact user entry. These are words or characters that you enter in the system exactly as they appear in the documentation. <Example text> Variable user entry. Pointed brackets indicate that you replace these words and characters with appropriate entries. © SAP SE Page 4 of 26 SAP Best Practices SAP SDS Deployment for Stream Intelligence (EB9): Configuration Guide Content SAP Event Stream Processor Deployment for Stream Intelligence: Configuration Guide........ 6 1 Purpose .............................................................................................................................. 6 2 Preparation ......................................................................................................................... 6 2.1 Prerequisites ............................................................................................................ 6 3 Configuration ...................................................................................................................... 7 3.1 Configure SDS Users ............................................................................................... 7 3.2 Installing solution SAP SDS Components ............................................................... 7 3.2.1 Confirming the BDI SDS Component Install ........................................................ 8 3.3 Configuring the Bootstrap properties ....................................................................... 8 3.3.1 Controlling Bootstrap with the INI file ................................................................... 9 3.3.2 Testing the Bootstrap Script .............................................................................. 10 3.4 Working With the Log File Adapters ...................................................................... 13 3.4.1 Running the Log File Adapters .......................................................................... 14 3.5 Mapping of Log File Formats and SAP HANA Table ............................................. 14 3.6 Working With the SAP SDS Source Code ............................................................. 17 3.7 Customizing Log File Processing ........................................................................... 18 3.7.1 Change the Properties File ................................................................................ 18 3.7.2 Change the SPS project code ........................................................................... 20 4 Dynamic Loading .............................................................................................................. 23 4.1 Configuration .......................................................................................................... 23 4.1.1 System Tables ................................................................................................... 23 4.1.2 BDI_CONFIG_PARAMETERS Options ............................................................. 24 4.1.3 Initial Configuration Scripts ................................................................................ 24 4.1.4 Starting Daemon ................................................................................................ 26 5 Additional Resources ....................................................................................................... 26 5.1 SAP Smart Data Streaming (SAP SDS) ................................................................ 26 © SAP SE Page 5 of 26 SAP Best Practices SAP SDS Deployment for Stream Intelligence (EB9): Configuration Guide SAP Event Stream Processor Deployment for Stream Intelligence: Configuration Guide 1 Purpose The purpose of this document is to describe the general configuration steps required to manually set up the SAP HANA® smart data streaming (SAP SDS) configuration for Stream Intelligence within the SAP HANA landscape that has already been installed using the corresponding installation or configuration guides for installation. This installation does not support multi-tenancy databases. 2 Preparation 2.1 Prerequisites Before you start installing this scenario, you must install the prerequisite building blocks. For more information, see the Latest Information and Configuration Guide: Getting Started on the SAP Service Marketplace for this solution, http://service.sap.com/rds-bdi . The following components are installed and configured prior to starting the Stream solution installation. These components are as follows: 1. SAP HANA smart data streaming (SDS) Or optionally, SAP® Event Stream Processing Software (SAP ESP) 2. UNIX ODBC Driver 3. SAP IQ (Optional) 4. Hadoop (Optional). The ODBC driver is expected to be installed on the host machine for the SDS and/or SAP ESP servers. If, optionally, nearline storage will be used in the implementation, the client drivers for SAP IQ and/or Hadoop are required to be installed on this host machine also. © SAP SE Page 6 of 26 SAP Best Practices SAP SDS Deployment for Stream Intelligence (EB9): Configuration Guide 3 Configuration Setting up SAP SDS technical components for the solution require the following high-level processes: 1. Install and configure SAP Stream software 2. Prepare the software bootstrap environment The following sections contain UNIX/Linux operations. We recommend that an experienced UNIX user performs the configuration steps. Many commands will start with, or contain the characters “dot-slash” ./ This references specific files relative to the current directory, as opposed to using the programs available through the PATH environment variable. It should always be included where indicated in the configuration instructions. It is assumed that the SAP HANA client is already installed. 3.1 Configure SDS Users The SDS user “SYS_STREAMING” needs to be added prior to running your bootstrap process. To configure this user, the following commands need to be executed on the SDS/ESP servers: ./streamingclusteradmin --uri=esps://<sds hostname>:<sds port> --username=SYSTEM -password=<password> grant permission all to user SYS_STREAMING quit or ./ streamingclusteradmin --uri=esps://<sds hostname>:<sds port> --username=SYSTEM --password=<password> grant permission all to user SYS_STREAMING quit 3.2 Installing solution SAP SDS Components The SAP HANA smart data streaming (SDS) environment for the rapid-deployment solution consists of a group of files that support the operation of the system. These files have been developed to be configured automatically based on local settings for server IDs, logon credentials and SAP HANA server environment. © SAP SE Page 7 of 26 SAP Best Practices SAP SDS Deployment for Stream Intelligence (EB9): Configuration Guide SDS has two logical workspace definitions, one for the SAP HANA studio, where source code is developed, and the other for the deployed server, where runtime code is staged to be loaded into the SDS node. This solution uses the same physical base directory for all its processing. This includes the storage of source code, executable code, configuration, and adapter management. The installation of the provided solution components requires extracting the contents of a zip file to the SDS projects base directory as follows: cd $STREAMING_HOME/cluster/projects tar –xf stream_rds_bdi.tar 3.2.1 Confirming the BDI SDS Component Install Once the unzip process has completed, check the directory structure with this command: ls –l $STREAMING_HOME/cluster/projects Verify that the directory listing includes the following folders: rds_bdi_stream repository run_stream_adapter No further configuration activity is required for this step. 3.3 Configuring the Bootstrap properties The core to the deployment of the solution SAP HANA smart data streaming system is the bootstrap script, which customizes the software to use the local servers, user credentials and other user preferences. You manage by setting the values in the SAP HANA Attribute View named <AT_BDI_CONFIG>. This script can be executed as many times as needed when the configuration values have changed. The script removes all the deployed code related to the STREAM application and replaces with the updated code. This also includes the ODBC definitions for HANA (ESP_HANA_STREAM) and IQ (ESP_IQ_STREAM). The script <bootstream.py> is written in Python and requires Python to be installed prior to running this script. If SDS and SAP HANA are installed on the same machine, Python is available in the folder “${DIR_EXECUTABLE}/Python/bin/”. If SDS or SAP ESP are on a separate host machine and Python is not installed with the applications, it may be accessed in the system folder “/usr/bin/python”. If not, request your system administrators to install Python on the host machine before proceeding. After Python is install, update the <bootstream.py> with the current location of Python. The first line of the script <bootstream.py> reads: “#!/usr/sap/HDB/HDB00/exe/Python/bin/python” Update the line to the new path of the Python application on the server. The <bootstream.py> script takes values from <AT_BDI_CONFIG> and applies to the following system and application files: © SAP SE Page 8 of 26 SAP Best Practices SAP SDS Deployment for Stream Intelligence (EB9): Configuration Guide odbc.ini – This file manages the connection properties to the database servers services.xml – This SDS staging file is used to load data service definitions into the cluster database IQ DDL files – If SAP IQ is configured for this system, these DDL files are processed to ensure that the SAP IQ environment is set up correctly. STREAM SDS Source – If the STREAM application is being used, its source code is configured, loaded into the SDS node and started. Configuring the bootstrap properties involves opening the <stream.ini> file in the vi editor, and checking that it contains the correct values for your system. Most settings are already correct. The bootstrap file allows the use of environment variables, so the lines that begin with <$STREAMING_HOME> will translate to the correct SDS base directory for your system. The section “ODBC_INI” must be modified to update the destination of the INI file based on the deployment target. If deploying to SDS on an SAP HANA host machine, the deployment “Destination” for the ODBC file is “/usr/sap/HDB/home/.odbc.ini”. If deploying to an SAP ESP server on a standalone machine, the deployment “Destination” is “/usr/local/etc/odbc.ini” The section that must be configured is named <[CONFIG_TABLE]>. It contains the following properties: SERVER=<server ID> – This is the domain name or IP address of the SAP HANA server PORT=3xx15 – This is the SAP HANA port number. The “xx” value is replaced by the SAP HANA instance number. ATTRIBUTE VIEW=“sap.rds-bdi.stream.config/AT_BDI_CONFIG“ – This is the default location for the configuration Attribute View. Confirm that this is the correct location and path for your system. Lines starting with the hash symbol (#) in the <stream.ini> file are treated as a comment line. 3.3.1 Controlling Bootstrap with the INI file The bootstrap script is designed to configure the deliverable package for the Stream solution. The configuration of this deliverable is controlled with the stream.ini file that provides the bootstrap with the necessary information on how to locate and configure various system components. The following section information will require updating. The entries for CONFIG_TABLE/SERVER and CONFIG_TABLE/ PORT will need to be updated to the match that of the HANA server that is hosting the repository information. The scripts esp_admin.sh and sds_admin.sh will also updating, but are dependent on the implementation platform. If SDS is being used, only the script sds_admin.sh needs to be updated with the information required to connect to the SDS server. Correspondingly, is ESP is being used, then the script esp_admin.sh needs to be updated with the information to connect to the ESP server. "SAP_RDS_BDI_STREAM"."sap.rdsbdi.stream.model::config.BDI_CONFIG_PARAMETERS" © SAP SE Page 9 of 26 SAP Best Practices 3.3.2 SAP SDS Deployment for Stream Intelligence (EB9): Configuration Guide Testing the Bootstrap Script The only way to verify that the bootstrap properties are set up correctly is to run the script as follows: 1. Make sure you are in the repository directory. cd $STREAMING_HOME/cluster/projects/repository 2. If using SDS, confirm that the SDS is running. ./sds_admin.sh >quit 3. If using ESP, confirm that the SAP ESP is running. ./esp_admin.sh >quit 4. Execute the bootstrap script ./bootstream.py <HANA User> <HANA Password> Where: <HANA User> – is the SAP HANA logon name <HANA Password> – is the SAP HANA logon password The script produces output similar to the following. Review your output carefully for errors, and make the appropriate corrections to your <stream.ini> file as needed. Base directory names in the following output may be different from your system. ***************************************************** * Starting the Bootstrap process ***************************************************** ***************************************************** ODBC_INI-TEMPLATE ***************************************************** [convertFile] processing /usr/local/etc/odbc.ini [processFile] File processed successfully: /usr/local/etc/odbc.ini Ensure that the process is successful and the odbc.ini file is in /usr/local/etc ***************************************************** SERVICE_XML-TEMPLATE ***************************************************** [convertFile] processing /data/sybase/ESP5_1/cluster/projects/esp1/repository/service_new.xml © SAP SE Page 10 of 26 SAP Best Practices SAP SDS Deployment for Stream Intelligence (EB9): Configuration Guide [processFile] File processed successfully: /data/sybase/ESP5_1/cluster/projects/esp1/repository/service_new.xml Executing ESP Services file. [processServices] Removing Service named: streamhanaservice [done] [processServices] Removing Service named: streamiqservice [done] The first time the bootstrap script is run, the removal steps may produce errors, which is expected behavior. [processServices] Loading services from /data/sybase/ESP5_1/cluster/projects/esp1/repository/service_new.xml Loaded service streamhanaservice Loaded service streamiqservice [done] The [done] message is from the esp_cluster_admin utility, and signifies that the operation was successful. ***************************************************** RDS_BDI_STREAM_CCL-TEMPLATE ***************************************************** [convertFile] processing /data/sybase/ESP5_1/cluster/projects/esp1/rds_bdi_stream/rds_bdi_stream.ccl [processFile] File processed successfully: /data/sybase/ESP5_1/cluster/projects/esp1/rds_bdi_stream/rds_bdi_stream.ccl ***************************************************** RDS_BDI_STREAM_CCR-TEMPLATE ***************************************************** [convertFile] processing /data/sybase/ESP5_1/cluster/projects/esp1/rds_bdi_stream/rds_bdi_stream.ccr [processFile] File processed successfully: /data/sybase/ESP5_1/cluster/projects/esp1/rds_bdi_stream/rds_bdi_stream.ccr RDS_BDI_STREAM_PROPERTY-TEMPLATE1 [convertFile] processing /data/sybase/ESP5_1/cluster/projects/esp1/run_rds_bdi_stream/Apache_Combined.properti es RDS_BDI_STREAM_PROPERTY-TEMPLATE2 [convertFile] processing /data/sybase/ESP5_1/cluster/projects/esp1/run_rds_bdi_stream/Apache_Common.properties RDS_BDI_STREAM_PROPERTY-TEMPLATE3 © SAP SE Page 11 of 26 SAP Best Practices SAP SDS Deployment for Stream Intelligence (EB9): Configuration Guide [convertFile] processing /data/sybase/ESP5_1/cluster/projects/esp1/run_rds_bdi_stream/IIS_Fixed.properties RDS_BDI_STREAM_PROPERTY-TEMPLATE4 [convertFile] processing /data/sybase/ESP5_1/cluster/projects/esp1/run_rds_bdi_stream/W3C_Extended.properties [processFile] File processed successfully: /data/sybase/ESP5_1/cluster/projects/esp1/run_rds_bdi_stream/W3C_Extended.properties ***************************************************** RDS_BDI_STREAM_CCL-DESTINATION ***************************************************** [processCCL] processing rds_bdi_stream [processCCL] Stopping rds_bdi_stream (note that an error will occur if the project is not running. That is expected.) [done] [processCCL] Removing rds_bdi_stream from the node/workspace [done] [processCCL] compiling rds_bdi_stream (there will be no output if the compile is successful) [processCCL] Adding rds_bdi_stream to the node/workspace [done] [processCCL] Starting rds_bdi_stream [done] ***************************************************** * The Bootstrap process has completed When the bootstrap script completes, the SDS/ESP code should be running. The following are ways to confirm the completion: If you are running SAP HANA Studio, connect to the SAP HANA server, change to the Run-Test Perspective to access the project and its windows/streams. If you are running ESP Studio, in the ESP Run-Test Perspective connect to the ESP server to access the windows/streams of the project. Or connect to SDS administrative console. ./sds_admin.sh Or ./esp_admin.sh > get projects ============================= Workspace: default Project: rds_bdi_stream Instance Count: 1 ----------- Instance Details ----------- © SAP SE Page 12 of 26 SAP Best Practices SAP SDS Deployment for Stream Intelligence (EB9): Configuration Guide Instance Name: 0 Controller Name: vhcalhdbdb Current Status: started-running Requested Status: started-running Failure Interval: 0 Failures Per Interval: 0 ============================= Workspace: default Project: rds_bdi_signal Instance Count: 1 ----------- Instance Details ----------Instance Name: 0 Controller Name: vhcalhdbdb Current Status: started-running Requested Status: started-running Failure Interval: 0 Failures Per Interval: 0 >quit You can run the <bootstrap.py> script as often as necessary. The initiation of this process causes project to be stopped, removed, the script to be recompiled and redeployed to the SAP HANA SDS server. 3.4 Working With the Log File Adapters The Stream SAP ESP system uses the unmanaged external adapter named the Log File Input Adapter. This adapter must be started and stopped independently from the Stream SAP ESP system, and it must be started after the Stream SAP ESP system is up and running. The Log File Input Adapter can be configured to process any format of Web server access log files. Configure the adapter by providing information about the log file format in a “properties” file. The solution includes five “properties” files to cover a range of formats. They are: APACHE_COMBINED.properties APACHE_COMMON.properties IIS_EXTENDED.properties IIS_FIXED.properties W3C_EXTENDED.properties Each log file type is configured to publish data to a different SAP ESP input stream, so any number of the adapters can run simultaneously. The properties files are managed by the bootstrap process, which is described previously in this guide. Therefore, the <_tokenized> version of the files must be edited if any changes are required beyond what is managed in the AT_BDI_CONFIG Attribute View. The runtime versions of these files are overwritten every time the bootstrap script is run. © SAP SE Page 13 of 26 SAP Best Practices SAP SDS Deployment for Stream Intelligence (EB9): Configuration Guide 3.4.1 Running the Log File Adapters The log file adapter code is managed in the directory named: $STREAMING_HOME/cluster/projects/run_rds_bdi_stream This directory contains the properties files, and a set of shell scripts that start the adapter for each log file type. The scripts are named: startadapter-APACHE_COMBINED.sh startadapter-APACHE_COMMON.sh startadapter-IIS_EXTENDED.sh startadapter-IIS_FIXED.sh startadapter-W3C_EXTENDED.sh The properties files are generated each time the bootstrap script is executed. The parameters added to the file properties file during the boot process include Input file name (log file), the SDS/ESP server connection properties and the input log file name. Each of these will be updated based on the configuration table values set for the properties: STREAM_LOG_FILE_NAME_? STREAM_ESP_URI STREAM_ESP_HOSTNAME STREAM_ESP_PORT STREAM_ESP_USER STREAM_ESP_PASSWORD Run the scripts for the specific log file to process. Each script stops any existing instance of the adapter before starting a new instance. 3.5 Mapping of Log File Formats and SAP HANA Table STREAM supports below 5 different formats for Web Logs out of the box. As per standards, the fields in log files are separated by space. Each of the fields from the log file are mapped to one more fields from"sap.rds-bdi.stream.model.rep::tables.ICM_LOG_TAB_DTL" table of the "SAP_RDS_BDI_STREAM" schema. The provided STREAM project is configured to read log format file as Single Live Log File mode. This can be modified to process a static file by changing the settings in the properties file (Input.WaitForGrowth=true) to false. Please note that static files are expected to be processed using the Dynamic Loading process. If your log file has a different format than any of the following 5 formats, then you must customize the SAP ESP code to match your log file format. Choose one log file format that closely matches your format and fields. Refer to the following sections for steps for customizing SAP ESP source code to match with your format. 1. Log File Format 1: Common log format © SAP SE Page 14 of 26 SAP Best Practices SAP SDS Deployment for Stream Intelligence (EB9): Configuration Guide HOST_IP HOST_NAME USER_ID LOGGED_TIME VIEW_URL_DETAIL HTTP_STATUS_CODE BYTES_SENT For example, 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 VIEW_URL_DETAIL is split into 3 different fields in SAP HANA o HTTP_METHOD o VIEW_URL o HTTP_METHOD For rest of the fields in the log format, there is 1:1 mapping with fields in SAP HANA Configured Mode: Single Live Log File Mode 2. Log File Format 2: Combined log format HOST_IP HOST_NAME USER_ID LOGGED_TIME VIEW_URL_DETAIL HTTP_STATUS_CODE BYTES_SENT REFERER_URI USER_AGENT TEMP_STRING VIEW_URL_DETAIL is split into 3 different fields in SAP HANA – o HTTP_METHOD o VIEW_URL o HTTP_METHOD REFERER_URI is mapped to ORIGINATING_SITE field in SAP HANA For rest of the fields in the log format, there is 1:1 mapping with fields in SAP HANA For example, 194.79.83.232 - - [22/Jul/2014:17:58:24 +0200] "GET /apachelog/images/stories/food.php?rf HTTP/1.1" 404 240 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6" "-" Configured Mode: Single Live Log File Mode 3. Log File Format 3: IIS Fixed DATE TIME HOST_IP USER_ID SERVER_IP SERVER_PORT HTTP_METHOD VIEW_URL QUERY_STRING HTTP_STATUS_CODE BYTES_SENT BYTES_RECD DURATION USER_AGENT REFERER_URI DATE and TIME fields are combined into single field - LOGGED_TIME in SAP HANA REFERER_URI is mapped to ORIGINATING_SITE field in SAP HANA For rest of the fields in the log format, there is 1:1 mapping with fields in SAP HANA For example, #Software: Microsoft Internet Information Services 6.0 #Version: 1.0 #Date: 2014-05-24 20:00:01 © SAP SE Page 15 of 26 SAP Best Practices SAP SDS Deployment for Stream Intelligence (EB9): Configuration Guide #Fields: date time c-ip cs-username s-ip s-port cs-method cs-uri-stem cs-uri-query scstatus sc-bytes cs-bytes time-taken cs(User-Agent) cs(Referrer) 2014-05-24 20:18:01 172.224.24.114 user1 206.73.118.24 80 GET /Default0.htm 200 7930 248 31 Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+2000+Server) http://64.224.24.114/ Configured Mode: Single Live Log File Mode 4. Log File Format 4: W3C_EXTENDED DATE TIME HOST_IP USER_ID SERVER_IP SERVER_PORT HTTP_METHOD VIEW_URL HTTP_STATUS_CODE BYTES_SENT BYTES_RECD DURATION USER_AGENT REFERER_URI DATE and TIME fields are combined into single field - LOGGED_TIME in SAP HANA REFERER_URI is mapped to ORIGINATING_SITE field in SAP HANA For rest of the fields in the log format, there is 1:1 mapping with fields in SAP HANA For example, #Software: Microsoft Internet Information Services 6.0 #Version: 1.0 #Date: 2014-08-29 20:00:01 #Fields: date time c-ip cs-username s-ip s-port cs-method cs-uri-stem cs-uri-query scstatus sc-bytes cs-bytes time-taken cs(User-Agent) cs(Referrer) 2014-08-29 20:18:01 156.210.29.120 user14 206.73.118.24 80 GET /Page0.htm 200 7930 248 31 Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+2000+Server) http://64.224.24.114/ Configured Mode: Single Live Log File Mode 5. Log File Format 5: IIS_EXTENDED DATE TIME SERVER_IP HTTP_METHOD VIEW_URL QUERY_STRING SERVER_PORT USER_ID HOST_IP USER_AGENT HTTP_STATUS_CODE HTTP_SUBSTATUS WEB_ENGINE_STATUS DURATION DATE and TIME fields are combined into single field - LOGGED_TIME in SAP HANA VIEW_URL and Query STRING fields are combined into single fieldVIEW_URL in SAP HANA For rest of the fields in the log format, there is 1:1 mapping with fields in SAP HANA For example, #Software: Microsoft Internet Information Services 7.5 #Version: 1.0 #Date: 2014-09-08 00:00:21 #Fields: date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) sc-status sc-substatus sc-win32-status time-taken 2012-09-05 00:00:21 10.46.69.228 POST /_vti_bin/lists.asmx - 1111 xyz\asa0_pchange 10.68.20.163 Application+Server+(1.0;740) 200 0 0 725 © SAP SE Page 16 of 26 SAP Best Practices SAP SDS Deployment for Stream Intelligence (EB9): Configuration Guide Configured Mode: Advancing Live Log File Mode If your log file has a different format than any of the five previous formats, then you must customize the SAP ESP code to match with your log file format. Choose the file format the closely matches your format and fields. Refer to the following sections for steps to customize SAP ESP source code to match with your format. 3.6 Working With the SAP SDS Source Code If you modify the SAP ESP source code for your specific requirements, then the following rules must be followed, because the bootstrap script modifies the SAP ESP source files: We recommend making no changes on the server. Any changes to the actual source code file are lost whenever the bootstrap script is run. The source code file of record for the STREAM project is in the <repository> folder along with the bootstrap script. The files are named: rds_bdi_stream.ccl_tokenized – The SDS source code for the STREAM project rds_bdi_stream.ccr_tokenized – The SDS resource file for the STREAM project We recommend as the procedure for modifying the source code is to archive and compress the code into a zip file into the project directory from under <$STREAMING_HOME/cluster/projects>, and transfer it to the computer where SAP HANA studio is installed. After transferring to the development machine, rename the files with the extension “_tokenized”, removing this suffix. Note: the “msdate” data type may report an error in your ESP Studio. If that happens, it can be modified to use the data type “bigdatetime”. In order to accommodate the configuration properties defined in the AT_BDI_CONFIG view, while also being syntactically correct for development and testing, a special coding convention has been defined. The values in the AT_BDI_CONFIG view are represented in the files as tokens, with a format of “$$$token-name$$$”. A token may represent a server domain, a database table name, a timer setting, and so on. © SAP SE Page 17 of 26 SAP Best Practices SAP SDS Deployment for Stream Intelligence (EB9): Configuration Guide The SDS compiler does interpret lines of code containing tokens and therefore the following coding convention is applied. When a line of code requires a token, these lines are commented out using the ‘//’ character sequence. Immediately following this, the actual command line without the token is listed. This allows the code to run in SAP HANA studio without error. During the deployment process with the bootstrap program, the tokens are replaced and the comment characters are removed from the current line. The bootstrap program immediately applies comment characters to the following line. When the completed SDS project is ready to be deployed back to the server, perform the following steps: Zip the directory and transfer it to the server. Unzip the directory and replace the existing source code directory under <$STREAMING_HOME/cluster/projects > Copy the ccl and ccr files from the source code directory to the <repository> directory, with the suffix of “_tokenized”. For example the files in the <repository> directory would be named: o rds_bdi_stream.ccl_tokenized o rds_bdi_stream.ccr_tokenized 3.7 Customizing Log File Processing For the following procedure, we recommend that it be completed by people with SAP ESP development experience. The Stream solution is preconfigured to support up to 5 simultaneous log file formats. They are: 1. Combined Log Format [Apache] 2. Common Log Format [Apache] 3. IIS format [Fixed] 4. W3C Extended format 5. IIS Extended Each one is associated with an SAP ESP input stream named instream<x> Where <x> is the corresponding number for the log file format. To modify or replace one of these formats, you must do the following: 3.7.1 Change the Properties File From the $STREAMING_HOME/cluster/projects/run_rds_bdi_stream directory, there are 5 files named: APACHE_COMBINED.properties APACHE_COMMON.properties IIS_FIXED.properties W3C_EXTENDED.properties IIS_EXTENDED.properties Select the file that corresponds to the log file format you wish to customize. For more information, see the SAP ESP Adapters guide for information on how to configure this file. Configure SAP ESP Log File Input Adapter to read Advancing/Rotating Log Files © SAP SE Page 18 of 26 SAP Best Practices SAP SDS Deployment for Stream Intelligence (EB9): Configuration Guide There are two types of live files, rotating and advancing. For rotating files, when the log file is full, the program that writes to the file renames the existing file and starts a new file with the original name. Typically, the new file name is based on the original file name with a suffix appended, for example, if the original file name is "access.log", the renamed files are "access.log.1", "access.log.2", and so on. The Log File Input adapter always reads the original file name. The adapter starts over at the beginning when the old file has been renamed and a new one created. For advancing files, when one file is full, the log file writer creates a new file and starts writing to that new file. The naming convention is typically a base name plus a suffix, where the suffix might be based on date/time or a sequential number (for example, "access-log.2007-01-01", "access-log.2007-01-02", and so on, or "access.log.1", "access.log.2", and so on). Regardless of the naming convention, the adapter opens these log files in chronological order by the most recent modification date. The Properties Files have three parameters that control the reading of such files. 1. Input.Filename: o Specifies the path to the first file in the directory to read. o Configure the value for Name of file either directly changing this parameter in Properties File, or in AT_BDI_CONFIG view for your LOG_FILE_NAME parameter. o If using Windows, backslashes in the path must be escaped. For example, to specify C:\exampleDir\file.log, enter C:\\exampleDir\\file.log. Regular expression file names cannot contain backslashes. o For Advancing: You must use regular expressions to configure the name of the file. For example, if your advancing log files have names like u_ex140907.log, u_ex140908.log, u_ex140910.log and so on, then your Filename to be configured will be u_ex.\*.log Adapter reads all files starting with name “u_ex” o For Rotating: You must use regular expressions to configure the name of the file. For example, If your rotating log files have names like testLog.log, testLog.log.2, testLog.log.1, testLog.log and so on, then your Filename to be configured is testLog.log Adapter reads file starting with name “testLog.log” In case of these Rotating Logs, testLog.log is a live log file. Adapter takes care of reading it along with other previous rotated files. o For Single Live Log File: If you have not configured your log file either for Rotating/Advancing mode and it is single live log file, then specify the full path of that log file. 2. Input.Class o Specifies whether to use advancing or rotating mode: o Advancing Mode: © SAP SE Page 19 of 26 SAP Best Practices SAP SDS Deployment for Stream Intelligence (EB9): Configuration Guide com.sybase.esp.adapter.logFileInput.AdvancingLogFileInput Bean o Rotating Mode: com.sybase.esp.adapter.logFileInput.RotatingLogFileInputB eanInput 3. Input.WaitForGrowth o If set to false, the adapter reads until reaching the end of the file and then exits. o If set to true, the adapter continues reading from the log file indefinitely at intervals of 100 milliseconds (the default interval). Refer to the SAP ESP Adapters guide for more details on configuring SAP ESP Log File Input adapter and corresponding Properties file for reading Advancing/Rotating Log Files. 3.7.2 Change the SPS project code When you have the properties file correctly edited, load the rds_bdi_stream SPS project into the SAP HANA studio and open it in the Development perspective. This can be done by using the “Import” function to import an existing project. You only need the .ccl file to create the SPS project, but you can download the whole SPS project folder from the SDS server at: $STREAMING_HOME/cluster/projects/rds_bdi_stream Zip up that folder, and transfer it to your SAP ESP development system. When the files are extracted into the target folder, the project can be imported into the Studio. The project name in the Studio will be the folder name hosting the CCL file. Near the top of the code file are SPS schemas that represent the record formats that the Logfile Adapter produces, based on the information in the properties files. For example, the Apache combined schema looks like: CREATE SCHEMA inbound_logfile_schema1 ( HOST_IP string , HOST_NAME string , USER_ID string , LOGGED_TIME timestamp , VIEW_URL_DETAIL string , HTTP_STATUS_CODE string , BYTES_SENT string, REFERER_URI String, USER_AGENT String, TEMP_STRING String ); Change the schema to match whatever changes you made in the properties files. Care must be taken when making changes, since the adapter coordinates output records with the input stream that uses these schemas. If the field names are different, or the datatypes do not match, or the field count is not the same, the adapter does not run. You do not have to make any further changes regarding the input of the data. The input stream receives all its information from the schema definition. © SAP SE Page 20 of 26 SAP Best Practices SAP SDS Deployment for Stream Intelligence (EB9): Configuration Guide Further down in the code is a Flex Stream named Converted event. In this example, we are modifying the values from <instream1>. Look in the Flex Stream for the following code: /* * instream1 defines Combined Log Format [Apache] */ ON instream1 { // exit; // outrec.STREAM_ID := '$$$STREAM_STREAM_ID_1$$$'; outrec.STREAM_ID := 'instream1'; outrec.LOGGED_TIME := instream1.LOGGED_TIME; outrec.HOST_IP := instream1.HOST_IP; outrec.HOST_NAME := checkDash(instream1.HOST_NAME); outrec.USER_ID := checkDash(instream1.USER_ID); outrec.SERVER_IP := NULL; outrec.SERVER_NAME := instream1.TEMP_STRING; outrec.SERVER_PORT := NULL; /* * VIEW_URL_DETAIL has format "HTTP_METHOD VIEW_URL HTP_PROTOCOL" * Split it into the 3 separate fields by using the spaces as delimeters */ integer sp1, sp2, sp3; sp1 := patindex(instream1.VIEW_URL_DETAIL, ' ', 1); sp2 := patindex(instream1.VIEW_URL_DETAIL, ' ', 2); sp3 := length(instream1.VIEW_URL_DETAIL); outrec.HTTP_METHOD := substr(instream1.VIEW_URL_DETAIL, 0, sp1); outrec.VIEW_URL := substr(instream1.VIEW_URL_DETAIL, sp1 + 1, sp2 - sp1 - 1); outrec.HTTP_PROTOCOL := substr(instream1.VIEW_URL_DETAIL, sp2 + 1, sp3 - sp2 - 1); outrec.HTTP_STATUS_CODE := checkDash(instream1.HTTP_STATUS_CODE); outrec.HTTP_SUBSTATUS := NULL; outrec.ORIGINATING_SITE := checkDash(instream1.REFERER_URI); outrec.USER_AGENT := checkDash(instream1.USER_AGENT); outrec.BROWSER := extractBrowser(instream1.USER_AGENT); outrec.OPERATING_SYSTEM := NULL; outrec.DEVICE := NULL; outrec.BYTES_SENT := to_integer(checkDash(instream1.BYTES_SENT)); outrec.BYTES_RECD := NULL; outrec.DURATION := NULL; outrec.WEB_ENGINE_STATUS := NULL; outrec.PRODUCT := NULL; outrec.COOKIES := NULL; outrec.SEARCH_WORD_PHASE := NULL; outrec.GUID := hex_string ( uniqueVal() ); outrec.LAST_MODIFIED := NULL; output setopcode (outrec, insert); }; © SAP SE Page 21 of 26 SAP Best Practices SAP SDS Deployment for Stream Intelligence (EB9): Configuration Guide Make special note of these two lines: // outrec.STREAM_ID := '$$$STREAM_STREAM_ID_1$$$'; outrec.STREAM_ID := 'instream1'; This is a coding standard implemented in these SDS projects to deploy user defined values by replacing token strings and occurs in the bootstrap process. The tokenized version of the code line is commented out with two slashes, and is immediately followed by the same line with a valid value for development and testing when the bootstrap script is not being used. Be careful to adhere to this coding standard, or the bootstrap process may damage the executable SDS code. Read the coding convention information at the top of the ccl file. The rest of the code is straightforward to an experienced programmer. The code stores the incoming data into the appropriate database column. Note that the “checkDash” and “extractBrowser” functions are defined in this SDS project in the global DECLARE block at the top of the file. All that is needed is to make sure that the incoming data is being mapped to the correct columns in the LOG_TABLE_DETAIL record (which maps to the SAP HANA database table). Test your changes in the local cluster, and when it works correctly, rename the file <rds_bdi_stream.ccl> to <rds_bdi_stream.ccl_tokenized> and replace the file that is in the $STREAMING_HOME/cluster/projects/repository directory. © SAP SE Page 22 of 26 SAP Best Practices SAP SDS Deployment for Stream Intelligence (EB9): Configuration Guide 4 Dynamic Loading The SAP HANA smart data streaming (SDS) adapter deamon was written and designed to work with the SAP HANA server and the STREAM application. The purpose of this application is to monitor the system for incoming log files and initiate the appropriate adapter to process the file. To achieve this, the deamon scans the table LKUP_UPLOAD_LOG_TAB to determine when new files have been uploaded to the system that have not been processed. If a file entry has a null value in the column ESP_COMPLETE_TIME, its processing state is considered incomplete. All rows with this condition are then selected to be processed with the required adapter based on the entries in the table BDI_CONFIG_PARAMETERS. The column LOG_FILE_FORMAT_ID in table LKUP_UPLOAD_LOG_TAB is compared against the column NAME in BDI_CONFIG_PARAMETERS to determine the adapter required to process the file. After the file has been processed, the adapter updates the LKUP_UPLOAD_LOG_TAB with the result of the processing, including timing. 4.1 Configuration 4.1.1 System Tables Short Name Full Name LKUP_UPLOAD_LOG_TAB SAP_RDS_BDI_STREAM.sap.rdsbdi.stream.model.rep::tables.LKUP_UPLOAD_LOG_ TAB BDI_CONFIG_PARAMETE RS SAP_RDS_BDI_CONFIG.sap.rdsbdi.config::tables.BDI_CONFIG_PARAMETERS © SAP SE Page 23 of 26 SAP Best Practices SAP SDS Deployment for Stream Intelligence (EB9): Configuration Guide 4.1.2 BDI_CONFIG_PARAMETERS Options Option Remark LOG_FILE_POLL_INTERVAL Time interval between polling SAP HANA server for new file entries. Apache_Combined_EXE Command line and options for processing Apache_Combined file type Apache_Common_EXE Command line and options for processing Apache_Common file type IIS_Fixed_EXE Command line and options for processing IIS_Fixed file type W3C_Extended_EXE Command line and options for processing W3C_Extended file type IIS_Extended_EXE Command line and options for processing IIS_Extended file type Application_Path Full or relative path to log adapters Startup_Directory Full or relative path to start up folder for log adapters LINUX_LOG_FILE_UPLOAD_LOCATION Full or relative path to log files (Linux/UNIX convention) WINDOWS_LOG_FILE_UPLOAD_LOCATION Full or relative path to log files (Windows convention) 4.1.3 Initial Configuration Scripts A configuration file “monitor_config.sql” has been provided with the download STREAM application in the repository folder. This SQL script file contains all the initial configuration options for the deamon required to function. Some values are expected to require changes based on the target server that hosts the deamon. These values include the following: 1. Application_Path 2. Startup_Directory 3. LINUX_LOG_FILE_UPLOAD_LOCATION 4. LOG_FILE_POLL_INTERVAL 5. WINDOWS_LOG_FILE_UPLOAD_LOCATION Update the values prior to loading the script into the database. A list of the SQL commands in the file are provided. To execute these SQL commands, a SAP HANA studio SQL session can be opened to the repository database and the commands executed. Alternatively, the SQL can be executed from the command line using the following example: dbisql -nogui -c "DSN=ESP_HANA_STREAM;UID=<DBUSER>;PWD=<DBPASSWD>" read monitor_config.sql © SAP SE Page 24 of 26 SAP Best Practices SAP SDS Deployment for Stream Intelligence (EB9): Configuration Guide All sample values of BDI_CONFIG_PARAMETERS options in 4.2.2 has already been imported into HANA table "SAP_RDS_BDI_STREAM"."sap.rdsbdi.stream.model::config.BDI_CONFIG_PARAMETERS" during the implementation of prerequisite EZ4 configuration guide - Stream Configuration for Web Stream Analysis. 1. In order to review the existing sample value, please logon into HANA system, and run the following statements in the SQL Console window via STREAM_ADMIN user: select "BUSINESS_AREA", "FUNCTIONAL_AREA", "NAME", "VALUE", "VIEWTYPE", "LASTMODIFIED" from "SAP_RDS_BDI_STREAM"."sap.rdsbdi.stream.model::config.BDI_CONFIG_PARAMETERS" where BUSINESS_AREA='STREAM' and FUNCTIONAL_AREA='ADAPTERMONITOR'; 2. If it is necessary to adjust existing value as following, please update the VALUE column’s existing data according to following and run the following statements in the SQL Console window via STREAM_ADMIN user: UPDATE "SAP_RDS_BDI_STREAM"."sap.rdsbdi.stream.model::config.BDI_CONFIG_PARAMETERS" SET VALUE = '/usr/sap/HDB/HDB00/streaming/STREAMING1_0/cluster/projects/run_rds_bdi_stream/', LASTMODIFIED = NOW() WHERE BUSINESS_AREA = 'STREAM' AND FUNCTIONAL_AREA = 'ADAPTERMONITOR' AND NAME = 'Application_Path'; UPDATE "SAP_RDS_BDI_STREAM"."sap.rdsbdi.stream.model::config.BDI_CONFIG_PARAMETERS" SET VALUE = '/hana/shared/HDB/streaming/STREAMING1_0/cluster/projects/rds_bdi_stream_log_files/', LASTMODIFIED = NOW() WHERE BUSINESS_AREA = 'STREAM' AND FUNCTIONAL_AREA = 'ADAPTERMONITOR' AND NAME = 'LINUX_LOG_FILE_UPLOAD_LOCATION'; UPDATE "SAP_RDS_BDI_STREAM"."sap.rdsbdi.stream.model::config.BDI_CONFIG_PARAMETERS" SET VALUE = '10', LASTMODIFIED = NOW() WHERE BUSINESS_AREA = 'STREAM' AND FUNCTIONAL_AREA = 'ADAPTERMONITOR' AND NAME = 'LOG_FILE_POLL_INTERVAL'; © SAP SE Page 25 of 26 SAP Best Practices SAP SDS Deployment for Stream Intelligence (EB9): Configuration Guide UPDATE "SAP_RDS_BDI_STREAM"."sap.rdsbdi.stream.model::config.BDI_CONFIG_PARAMETERS" SET VALUE = '/usr/sap/HDB/HDB00/streaming/STREAMING1_0/cluster/projects/run_rds_bdi_stream/', LASTMODIFIED = NOW() WHERE BUSINESS_AREA = 'STREAM' AND FUNCTIONAL_AREA = 'ADAPTERMONITOR' AND NAME = 'Startup_Directory'; UPDATE "SAP_RDS_BDI_STREAM"."sap.rdsbdi.stream.model::config.BDI_CONFIG_PARAMETERS" SET VALUE = 'c:\temp\', LASTMODIFIED = NOW() WHERE BUSINESS_AREA = 'STREAM' AND FUNCTIONAL_AREA = 'ADAPTERMONITOR' AND NAME = 'WINDOWS_LOG_FILE_UPLOAD_LOCATION'; 4.1.4 Starting Daemon To start the dynamic loading program, a valid HANA user is required to connect to the HANA database, with access to the tables PROCESSING/CONTROL_TABLE and PROCESSING/LOGFILE_TABLE tables as specified in the monitor_app.ini file. Prior to executing the script, verify that the path to the python interpreter is correctly set. The path can be verified in the first line of the script to point to the location of the interpreter on the deployment system. cd $STREAMING_HOME/cluster/projects/run_rds_bdi_stream vi monitor_app.py To start the daemon, run the following commands: cd $STREAMING_HOME/cluster/projects/run_rds_bdi_stream ./monitor_app.py <HANA user> <HANA password> 5 Additional Resources 5.1 SAP Smart Data Streaming (SAP SDS) For information about installing, configuring, and optimizing SAP SDS, see the resource listed in the table: Topic Guide/Tool Quick Link SAP Smart Data Streaming SAP Help Portal http://help.sap.com/hana_options_sds Product documentation is based on version and SP level © SAP SE Page 26 of 26