A REPORT ON LOG ANALYSIS THROUGH ELK STACK BY Madhav Bajaj 2019B3A70256G AT L&T Infotech, Data Analytics A Practice School-1 Station of BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, PILANI (JUNE, 2021) 1 A REPORT ON LOG ANALYSIS THROUGH ELK STACK BY Madhav Bajaj 2019B3A70256G Economics and Computer Science Prepared in partial fulfillment of the Practice School-I Course No. BITS F221 AT L&T Infotech, Data Analytics A Practice School-1 Station of BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, PILANI (JUNE, 2021) 2 ACKNOWLEDGEMENT I would like to express my earnest and sincere appreciation to my instructor, Mr. Sravan Danda, Ph.D., Practice School Unit, BITS Pilani, who provided me a marvelous opportunity to prepare this report and assisted me throughout the preparation. Furthermore, I would also like to acknowledge with much appreciation the crucial role of the entire faculty of the Practice School Unit, who encouraged me throughout the completion of the report. I would like to convey my recognition and appreciation to everyone who, directly or indirectly, has helped me in this endeavor. 3 BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE PILANI (RAJASTHAN) Practice School Division Station: Data Analytics Centre: L&T Infotech Duration: 53 Days Date of Start: 1st June 2021 Date of Submission: 25th June 2021 Title of the Project: Log Analysis through ELK Stack Student Name: Madhav Bajaj ID No.: 2019B3A70256G Disciplines: Economics and Computer Science PS Station Reporting Manager Name: Mr. Vignesh V Designation: Associate Principal – Consulting NAUT PS Faculty Name: Mr. Sravan Danda Key Words: Log Analysis, Data Analysis, ELK, AIops Project Areas: Log Analysis, Elastic Software Ecosystem, Production software, Windows event logs monitoring Abstract: This report details the progress made so far in setting up a comprehensive centralized log analytics tool for a client at LTI. Log analysis is the process of reviewing, interpreting, and understand computer-generated records called logs. The client has tasked an LTI team with the development of a network capable of extracting and aggregating Windows event logs from multiple servers and necessary tools to conduct analytical operations on the amassed data set. Given the financial constraints enforced by the client, open-source softwares from the Elastic ecosystem were selected. The team has been working on developing a rudimentary Proof of Concept (PoC) for the ELK stack (Elasticsearch, Logstash, and Kibana) using Virtual Machines. The installation of the ELK components has been completed, and the objective ahead is to make them work in unison. (25Th June 2021) 4 TABLES OF CONTENT No. 1 2 3 4 5 6 7 8 9 10 Heading Cover Title Page Acknowledgement Abstract sheet Introduction Project Work Project Manual Conclusion References Glossary Page No. 1 2 3 4 6 7 11 21 22 22 5 INTRODUCTION The process of deciphering computer-generated log messages, also known as log events, audit trail data, or simply logs, is known as log analysis. Log analysis provides helpful indicators that track metrics across the infrastructure. Log analysis is an indispensable tool for any software management team. Logs form the lexicon for the language software management teams writes in. Given how fundamental log analysis for the efficient functioning of any software, several sophisticated management tools have emerged. The one chosen by the team at LTI is the elastic stack or the ELK stack. Comprising of Elasticsearch, Logstash, and Kibana, ELK stack is complex open-source software capable of conducting the complete process of log analysis from data extraction to final visualization. Each element of the ELK stack servers a unique purpose. • • • • Elasticsearch is the distributed search and analytics engine at the heart of the Elastic Stack. Logstash is an open-source data collection engine developed for real-time data ingestion from a variety of sources. Kibana is an open frontend application that provides search and data visualization capabilities for data indexed in Elasticsearch. Beats are open source data shippers installed as agents on source servers to send operational data. ELK stack installation and initial configuration is a difficult task, and thus virtual machines (created through Oracle's VirtualBox) have been utilized to experiment with different settings for optimal functioning. 6 PROJECT WORK Theory While the ELK stack can assess a variety of data, we are only concerned with the window event logs, and hence the Winlogbeat has been used. Winlogbeat Winlogbeats belongs to the family of single-purpose data shippers, Beats which also offers other platforms like Filebeats, Packerbeats for specific objectives. It extracts data from multiple events logs through Windows APIs and sends filtered data to the data aggregator Logstash (Elasticsearch can also be a configured output). It's capable • • • • of capturing event data from any event logs, including: application events hardware events security events system events Winlogbeat ensures consistency in data passage. The read position for each event log is persisted to disk to allow Winlogbeat to resume after restarts. The software is built on the libbeat framework constructed entirely in the GO programming language like all beats offerings. Libbeat library offers the API through which beats ship data to the specific output point. Winlogbeat can ship data to two possible output points directly through the Elasticsearch platform or via Logstash, where additional data processing can be done. Logstash Elastic describes Logstash as a server-wide processing pipeline that ingests data from a multitude of sources, transforms it, and then sends it to a desired viable location. Logstash processes data in three stages: • Inputs • Filters • Outputs Each input stage in the Logstash pipeline runs in its own thread. Inputs write events to a central queue that is either in memory (default) or on disk. Each pipeline worker thread takes a batch of events of this queue, runs the batch of events through the configured filters, and then runs the filtered events through any outputs. The size of the batch and the number of pipeline worker threads are configurable. Notable Characteristics of Logstash: 7 • • • • If Logstash terminates unsafely, any events that are stored in memory will be lost. To help prevent data loss, one can enable Logstash to persist in-flight events to disk. By default, Logstash does not guarantee event order. Event reordering is possible, but with certain adjustments to a single worker settings, can be avoided. Logstash offers numerous filters for data processing like grok (for parsing and structuring text), mutate, drop, clone, etc. A pipeline management feature (Pipeline Viewer UI) in Kibana centralizes the creation and management of Logstash configuration pipelines. This feature is only available to paid members. Elastic also offers a Monitoring UI for Logstash. • Data resiliency features offered by Logstash: o Persistent Queues protect against data loss by storing events in an internal queue on a disk. o Dead Letter Queues (DLQ) provide on-disk storage for events that Logstash is unable to process. • Scaling ingests with Beats and Logstash: o At-least-once delivery is guaranteed for the ingest flow if Filebeats and Winlogbeats are used as collection means. o The Beats input plugin exposes a secure, acknowledgement-based endpoint for Beats to send data to Logstash. • Logstash offers numerous filters for various data operationso Core Operations (like Date, Drop, Fingerprint, Mutate, Ruby etc) o Deserializing Data o Extracting Fields and Wrangling Data o Enriching Data with Lookups Elasticsearch Elasticsearch is a distributed, free and open search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured. 8 Process – • Raw data flows into Elasticsearch from a variety of sources, including logs, system metrics, and web applications. • Ingest nodes facilitate the process of Data Ingestion, the process by which this raw data is parsed, normalized and enriched before it is indexed in Elasticsearch. • Once indexed in Elasticsearch, users can run complex queries against their data and use aggregations to retrieve complex summaries of their data. Visualization of complex summaries of the data then can be presented through Kibana. Architecture – • • • • • • • • All the data in Elasticsearch is stored in the form of JSON strings called Documents. The document is the smallest unit of data that can be stored in Elasticsearch. The collection of documents of similar data types (not necessarily identical) is called an Index. The index is divided into shards in a logical manner to fasten the data reading and retrieving process. Shards are individual instances of a Lucene index. Lucene is the underlying technology that Elasticsearch uses for extremely fast data retrieval. Each index is comprised of shards across one or many nodes. A node is a single instance of Elasticsearch. It usually runs one instance per machine. They communicate with each other via network calls to share the responsibility of reading and writing data. Clusters are a collection of nodes that communicate with each other to read and write to an index. Kibana Kibana is essentially the UI for the ELK stack, which provides visualization for the data analyzed by Elasticsearch. There are several components to Kibana. Kibana Lens is an intuitive UI that simplifies the process of data visualization through a drag-and-drop experience. Kibana Canvas deals with live data visualization. With Canvas, live data can be pulled directly from Elasticsearch and combined with colors, images, text, and other customized options to create dynamic, multipage displays. It also offers alerting tools. Alerting allows you to define rules (A rule specifies a background task that runs on the Kibana server to check for specific conditions) to detect complex conditions within different Kibana apps and trigger actions when those conditions are met. This feature is controlled through the Management UI. Kibana also offers stack monitoring and management – data and performance of ELK stack components can be monitored and controlled independently. 9 Task Handled: Proof of Concept Development • • • • • A 2012 Windows Server Virtual Machine has been established through Oracle VirtualBox. The ecosystem was prepared with all the necessary software, including Java, Winrar, and Microsoft Virtual Code. Elasticsearch, Kibana, Logstash, and Winlogbeat files were downloaded and unzipped. The components were installed using complex CMD and PowerShell commands. Elasticsearch and Kibana were installed on local servers. These steps were repeated multiple times on different VMs as the installation of the ELK is an extremely complex process with immense scope for ambiguity. Work Regimen: After being inducted into the company’s outlook system, daily meetings were conducted on the Teams platform, where progress was discussed and work was assigned. 10 PROJECT MANUAL An important task of this project was to document all the steps taken to setup an operational ecosystem through elk stack and the same is presented here. ELK STACK INSTALLATION MANUAL 11 TABLE OF CONTENTS No. 1 2 3 4 5 6 7 8 9 10 Section Setting up the Virtual Machine on VirtualBox Downloading the ELK Files and supplementary software Installing Elasticsearch Installing Kibana Installing Winlogbeat Accessing the Kibana Dashboard Configuring Elasticsearch for multiple data shippers Network configurations Elasticsearch config file changes Kibana config file changes Winlogbeat config file changes Winlogbeat logging configuration Elastic security measures Kibana Guide Error Resolution Page 3 3 4 4 4 4 5 6 7 8 9 10 10 10 12 Setting up the Virtual Machine on VirtualBox 1) Download the VirtualBox installer and complete the installation: https://www.virtualbox.org/wiki/Downloads 2) Download the Window 2012 R2 Server ISO file: https://www.microsoft.com/en-in/evalcenter/evaluate-windows-server-2012-r2 3) VM Installation steps a. Click on the new button in VirtualBox, assign a name to the VM, and select Window 2012 from the Type dropdown menu. Click next. b. Choose the amount of RAM to be allocated to the VM . c. Select “Create a virtual hard disk now,” then proceed. d. Select Virtual Hard Disk under Hard Disk File Type, then proceed. e. Select dynamically allocated and then proceed. f. Choose the amount of storage to be allocated to the VM and finally click create. g. Go to the Storage section under the VM settings and click on the “empty” Disk image, then insert the ISO file downloaded in Step 2 into the optical drive of the VM. h. Start the VM. Specifications of the VM used: I. II. III. 2048 MB RAM 32 GB Hard Disk Memory Host Machine Processor: Intel i7 9th Gen Downloading the ELK Files and supplementary software 1) Download Google Chrome (any browser other than the Internet Explorer). Chrome makes downloading the required files easier. 2) Download and install the latest version of Java SE as it’s an essential component for the functioning of the Elastic Stack: https://www.oracle.com/java/technologies/javase-jdk16downloads.html 3) Download and Install the MS Visual Studio Code (System Installer 64bit/32 bit depending on the system’s architecture)(Link: https://code.visualstudio.com/download ). Visual Studio Code will be used to configure the yml files. 4) Download Elasticsearch for WINDOWS: https://www.elastic.co/downloads/elasticsearch 5) Download Kibana for WINDOWS: https://www.elastic.co/downloads/kibana 6) Download Winlogbeat for WINDOWS ZIP 64bit/32bit (depending on the architecture): https://www.elastic.co/downloads/beats/winlogbeat 7) Ensure Elasticsearch, Kibana and Winlogbeat are of the same version. 8) Download WinRAR to unzip the files: https://www.win-rar.com/start.html?&L=0 9) Unzip all the folders into a single folder called “ELK,” see the image below 13 Installing Elasticsearch 1) Open the command prompt and move the current directory to the bin folder by executing the command “cd C:\ELK\elasticsearch-7.13.3\bin” (the command might be different if the installed path is different) 2) Execute the elasticsearch.bat file through the command prompt and let it run in the background. Elastic documentation for installing Elasticsearch https://www.elastic.co/guide/en/elasticsearch/reference/current/zip-windows.html Installing Kibana 1) Open the command prompt and move the current directory to the bin folder by executing the command “cd C:\ELK\kibana-7.13.3-windows-x86_64\bin” (the command might be different if the installed path is different) 2) Execute the kibana.bat file through the command prompt and let it run in the background. Elastic documentation for installing Kibana https://www.elastic.co/guide/en/kibana/current/windows.html Installing Winlogbeat 1) Follow the winlogbeat documentation very carefully. https://www.elastic.co/guide/en/beats/winlogbeat/7.13/winlogbeat-installationconfiguration.html Accessing the Kibana Dashboard Elasticsearch details can be accessed at http://localhost:9200/ Kibana Dashboard and Discover can be accessed at http://localhost:5601/ 14 Configuring Elasticsearch for multiple data shippers 1) Set up another VM (refer to Setting up the Virtual Machine on VirtualBox). 2) Download and install winlogbeat on the new VM (refer to Installing the Winlogbeat). Network configurations The host VM (the VM with Elasticsearch and Kibana) and newly created VM, which will acts solely as a data shipper, have to be on the local network. a. Enter the command “ipconfig” in the command prompt and get the i. IPv4 address ii. Subnet mask iii. Default Gateway b. Open Network & Internet Settings → Change Adapter Settings → Ethernet (right-click) → properties → Internet Protocol Version 4 (TCP/IPv4) → Properties c. Switch to “use the following IP address” and fill in the detail from step a. d. Switch to “use the following DNS server addresses” and fill the Prefered DNS server with the Default Gateway from step a. Save all the settings and click okay. Image for reference: Internet Protocol Version 4 Properties for the Host VM 15 e. Repeat the step b for the new VM (the data shipper VM). Put in the same Subnet mask, Default Gateway and Preferred DNS server as the host VM. Put in an different IPv4 address such that the host VM and the data shipper VM lie on the same network. Image for reference: Internet Protocol Version 4 Properties for the data shipper VM Config file Changes Make sure the elastic stack is not functioning before making the following changes to configuration files on the host VM (the VM with Elasticsearch and Kibana): Elasticsearch config file changes: a. Go to the config folder in the elasticsearch folder and open the “elasticsearch.yml” file with Visual Studio Code. In the default config yml file, everything should be commented. b. Paste the following section in the Network section transport.host: localhost transport.tcp.port: 9300 http.port: 9200 network.host: 0.0.0.0 16 c. Indentation is significant for the yml files to function appropriately, so make sure the network section mentioned in step b looks like the image below: Kibana config file changes: a. Go to the config folder in the kibana folder and open the “kibana.yml” file with Visual Studio Code. In the default config yml file, everything should be commented. b. Uncomment the second line (see the image below) c. Replace the seventh line with this (see the image below) d. Replace the 32nd line with this (see the image below) (use the static IPv4 address of the host VM set before, 10.0.2.15 in my case) 17 Winlogbeat config file changes: a. Go to the Winlogbeat folder and open the “winlogbeat.yml” file. b. Configure the winlogbeat specific options to resemble the image below. If logs of different types are desired (like security, system), refer the image below winlogbeat.event_logs: - name: Application - name: Security - name: System c. Under the Kibana section of the yml file, make the following changes (refer to the image below) (Use the static IPv4 address of the host VM, 10.0.2.15 in my case) 18 d. Under the Elasticsearch Output section of the yml file, make the following changes (refer to the image below) (Use the static IPv4 address of the host VM, 10.0.2.15 in my case) Winlogbeat logging configuration: Winlogbeat generates logs if any errors are encountered. If records of these errors are desired, make the following adjustments under the logging section of the “winlogbeat.yml” file. Enter the desired path but ensure the indentation presented in the image. Kibana now can be assessed at http://10.0.2.15:5601/ (host VM IPv4 address in my case is 10.0.2.15). 19 Elastic security measures Follow the elastic documentation carefully to put minimal security measures in place. https://www.elastic.co/guide/en/elasticsearch/reference/current/security-minimalsetup.html Kibana Guide Follow the Kibana documentation: https://www.elastic.co/guide/en/kibana/7.13/index.html Error Resolution 1) Error: Winlogbeat- Exiting: Error while initializing input: required 'object', but found 'string' in field Solution: Recheck the identation of the code in the winlogbeat yml files 2) Error: Elasticsearch- org.elasticsearch.bootstrap.StartupException: ElasticsearchException[X-Pack is not supported and Machine Learning is not available for [windows-x86] Solution: Put the entry xpack.enabled: false in elasticsearhc.yml file 3) Error: Kibana- Request to Elasticsearch failed: {"error":{}} Solution: Change the parameter elasticsearch.requestTimeout: 500000 in elasticsearch.yml file 4) Error: Elasticsearch - \Common was unexpected at this time. Solution: Ensure all the versions of the compenents of the elastic stack are the same. 5) Error: Elasticsearch – JAVA_HOME environment variable must be set! Solution: JAVA_HOME variable must be set as an environment variable if elasticsearch is being installed as a service. 20 CONCLUSION The experience so far with my mentors has been a pleasant one. Critical understanding has been instilled regarding the Elastic ecosystem and its components and general know-how for testing experimental software. ELK stack is an essential tool to master for any personnel in this field and this venture has assisted me significantly. 21 REFERENCES Elastic Theory and Documentation: https://logz.io/learn/complete-guide-elk-stack/ https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html https://www.elastic.co/guide/en/kibana/7.13/index.html https://www.elastic.co/guide/en/logstash/current/index.html https://www.elastic.co/guide/en/beats/winlogbeat/current/index.html PoC Development: https://medium.com/@samil.mehdiyev/elk-stack-on-windows-server-part-2-installationd2a7200b65a6 https://elasticsearch.tutorials24x7.com/blog/how-to-install-elasticsearch-kibana-and-logstashelk-elastic-stack-on-windows https://burnhamforensics.com/2018/11/18/sending-logs-to-elk-with-winlogbeat-and-sysmon/ GLOSSARY ELK – Elasticsearch, Logstash, and Kibana NAUT - NWOW Transformation Automation CMD – Command Prompt VM – Virtual Machine 22