Uploaded by Madhav Bajaj

Final Project Report

advertisement
A REPORT
ON
LOG ANALYSIS THROUGH ELK STACK
BY
Madhav Bajaj 2019B3A70256G
AT
L&T Infotech, Data Analytics
A Practice School-1 Station of
BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, PILANI
(JUNE, 2021)
1
A REPORT
ON
LOG ANALYSIS THROUGH ELK STACK
BY
Madhav Bajaj
2019B3A70256G
Economics and Computer Science
Prepared in partial fulfillment of the Practice School-I Course No. BITS
F221
AT
L&T Infotech, Data Analytics
A Practice School-1 Station of
BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, PILANI
(JUNE, 2021)
2
ACKNOWLEDGEMENT
I would like to express my earnest and sincere appreciation to my instructor, Mr.
Sravan Danda, Ph.D., Practice School Unit, BITS Pilani, who provided me a
marvelous opportunity to prepare this report and assisted me throughout the
preparation.
Furthermore, I would also like to acknowledge with much appreciation the crucial
role of the entire faculty of the Practice School Unit, who encouraged me
throughout the completion of the report.
I would like to convey my recognition and appreciation to everyone who, directly or
indirectly, has helped me in this endeavor.
3
BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE PILANI
(RAJASTHAN)
Practice School Division
Station: Data Analytics
Centre: L&T Infotech
Duration: 53 Days
Date of Start: 1st June 2021
Date of Submission: 25th June 2021
Title of the Project: Log Analysis through ELK Stack
Student Name: Madhav Bajaj
ID No.: 2019B3A70256G
Disciplines: Economics and Computer Science
PS Station Reporting Manager Name: Mr. Vignesh V
Designation: Associate Principal – Consulting NAUT
PS Faculty Name: Mr. Sravan Danda
Key Words: Log Analysis, Data Analysis, ELK, AIops
Project Areas: Log Analysis, Elastic Software Ecosystem, Production software, Windows event
logs monitoring
Abstract:
This report details the progress made so far in setting up a comprehensive centralized log analytics
tool for a client at LTI. Log analysis is the process of reviewing, interpreting, and understand
computer-generated records called logs. The client has tasked an LTI team with the development
of a network capable of extracting and aggregating Windows event logs from multiple servers and
necessary tools to conduct analytical operations on the amassed data set. Given the financial
constraints enforced by the client, open-source softwares from the Elastic ecosystem were
selected. The team has been working on developing a rudimentary Proof of Concept (PoC) for the
ELK stack (Elasticsearch, Logstash, and Kibana) using Virtual Machines. The installation of the ELK
components has been completed, and the objective ahead is to make them work in unison.
(25Th June 2021)
4
TABLES OF CONTENT
No.
1
2
3
4
5
6
7
8
9
10
Heading
Cover
Title Page
Acknowledgement
Abstract sheet
Introduction
Project Work
Project Manual
Conclusion
References
Glossary
Page No.
1
2
3
4
6
7
11
21
22
22
5
INTRODUCTION
The process of deciphering computer-generated log messages, also known as log events, audit
trail data, or simply logs, is known as log analysis. Log analysis provides helpful indicators that
track metrics across the infrastructure. Log analysis is an indispensable tool for any software
management team. Logs form the lexicon for the language software management teams writes
in. Given how fundamental log analysis for the efficient functioning of any software, several
sophisticated management tools have emerged. The one chosen by the team at LTI is the elastic
stack or the ELK stack. Comprising of Elasticsearch, Logstash, and Kibana, ELK stack is complex
open-source software capable of conducting the complete process of log analysis from data
extraction to final visualization. Each element of the ELK stack servers a unique purpose.
•
•
•
•
Elasticsearch is the distributed search and analytics engine at the heart of the Elastic
Stack.
Logstash is an open-source data collection engine developed for real-time data ingestion
from a variety of sources.
Kibana is an open frontend application that provides search and data visualization
capabilities for data indexed in Elasticsearch.
Beats are open source data shippers installed as agents on source servers to send
operational data.
ELK stack installation and initial configuration is a difficult task, and thus virtual machines
(created through Oracle's VirtualBox) have been utilized to experiment with different settings for
optimal functioning.
6
PROJECT WORK
Theory
While the ELK stack can assess a variety of data, we are only concerned with the window event
logs, and hence the Winlogbeat has been used.
Winlogbeat
Winlogbeats belongs to the family of single-purpose data shippers, Beats which
also offers other platforms like Filebeats, Packerbeats for specific objectives.
It extracts data from multiple events logs through Windows APIs and sends
filtered data to the data aggregator Logstash (Elasticsearch can also be a
configured output).
It's capable
•
•
•
•
of capturing event data from any event logs, including:
application events
hardware events
security events
system events
Winlogbeat ensures consistency in data passage. The read position for each event log is persisted
to disk to allow Winlogbeat to resume after restarts.
The software is built on the libbeat framework constructed entirely in the GO programming
language like all beats offerings. Libbeat library offers the API through which beats ship data to
the specific output point.
Winlogbeat can ship data to two possible output points directly through the Elasticsearch platform
or via Logstash, where additional data processing can be done.
Logstash
Elastic describes Logstash as a server-wide processing pipeline that ingests data
from a multitude of sources, transforms it, and then sends it to a desired viable
location.
Logstash processes data in three stages:
• Inputs
• Filters
• Outputs
Each input stage in the Logstash pipeline runs in its own thread. Inputs write events to a central
queue that is either in memory (default) or on disk.
Each pipeline worker thread takes a batch of events of this queue, runs the batch of events
through the configured filters, and then runs the filtered events through any outputs. The size of
the batch and the number of pipeline worker threads are configurable.
Notable Characteristics of Logstash:
7
•
•
•
•
If Logstash terminates unsafely, any events that are stored in memory will be lost. To
help prevent data loss, one can enable Logstash to persist in-flight events to disk.
By default, Logstash does not guarantee event order. Event reordering is possible, but
with certain adjustments to a single worker settings, can be avoided.
Logstash offers numerous filters for data processing like grok (for parsing and structuring
text), mutate, drop, clone, etc.
A pipeline management feature (Pipeline Viewer UI) in Kibana centralizes the creation
and management of Logstash configuration pipelines. This feature is only available to
paid members. Elastic also offers a Monitoring UI for Logstash.
•
Data resiliency features offered by Logstash:
o Persistent Queues protect against data loss by storing events in an internal queue
on a disk.
o Dead Letter Queues (DLQ) provide on-disk storage for events that Logstash is
unable to process.
•
Scaling ingests with Beats and Logstash:
o At-least-once delivery is guaranteed for the ingest flow if Filebeats and
Winlogbeats are used as collection means.
o The Beats input plugin exposes a secure, acknowledgement-based endpoint for
Beats to send data to Logstash.
•
Logstash offers numerous filters for various data operationso Core Operations (like Date, Drop, Fingerprint, Mutate, Ruby etc)
o Deserializing Data
o Extracting Fields and Wrangling Data
o Enriching Data with Lookups
Elasticsearch
Elasticsearch is a distributed, free and open search and analytics engine
for all types of data, including textual, numerical, geospatial, structured,
and unstructured.
8
Process –
• Raw data flows into Elasticsearch from a variety of sources, including logs, system
metrics, and web applications.
• Ingest nodes facilitate the process of Data Ingestion, the process by which this raw
data is parsed, normalized and enriched before it is indexed in Elasticsearch.
• Once indexed in Elasticsearch, users can run complex queries against their data and
use aggregations to retrieve complex summaries of their data.
Visualization of complex summaries of the data then can be presented through Kibana.
Architecture –
•
•
•
•
•
•
•
•
All the data in Elasticsearch is stored in the form of JSON strings called Documents.
The document is the smallest unit of data that can be stored in Elasticsearch.
The collection of documents of similar data types (not necessarily identical) is called an
Index.
The index is divided into shards in a logical manner to fasten the data reading and
retrieving process.
Shards are individual instances of a Lucene index. Lucene is the underlying technology
that Elasticsearch uses for extremely fast data retrieval.
Each index is comprised of shards across one or many nodes.
A node is a single instance of Elasticsearch. It usually runs one instance per machine.
They communicate with each other via network calls to share the responsibility of
reading and writing data.
Clusters are a collection of nodes that communicate with each other to read and write to
an index.
Kibana
Kibana is essentially the UI for the ELK stack, which provides
visualization for the data analyzed by Elasticsearch. There are several
components to Kibana.
Kibana Lens is an intuitive UI that simplifies the process of data
visualization through a drag-and-drop experience.
Kibana Canvas deals with live data visualization. With Canvas, live data
can be pulled directly from Elasticsearch and combined with colors,
images, text, and other customized options to create dynamic, multipage displays.
It also offers alerting tools. Alerting allows you to define rules (A rule specifies a background task
that runs on the Kibana server to check for specific conditions) to detect complex conditions
within different Kibana apps and trigger actions when those conditions are met. This feature is
controlled through the Management UI.
Kibana also offers stack monitoring and management – data and performance of ELK stack
components can be monitored and controlled independently.
9
Task Handled:
Proof of Concept Development
•
•
•
•
•
A 2012 Windows Server Virtual Machine has been established through Oracle VirtualBox.
The ecosystem was prepared with all the necessary software, including Java, Winrar, and
Microsoft Virtual Code.
Elasticsearch, Kibana, Logstash, and Winlogbeat files were downloaded and unzipped.
The components were installed using complex CMD and PowerShell commands.
Elasticsearch and Kibana were installed on local servers.
These steps were repeated multiple times on different VMs as the installation of the ELK is an
extremely complex process with immense scope for ambiguity.
Work Regimen:
After being inducted into the company’s outlook system, daily meetings were conducted on the
Teams platform, where progress was discussed and work was assigned.
10
PROJECT MANUAL
An important task of this project was to document all the steps taken to setup an operational
ecosystem through elk stack and the same is presented here.
ELK STACK
INSTALLATION
MANUAL
11
TABLE OF CONTENTS
No.
1
2
3
4
5
6
7
8
9
10
Section
Setting up the Virtual Machine on VirtualBox
Downloading the ELK Files and supplementary software
Installing Elasticsearch
Installing Kibana
Installing Winlogbeat
Accessing the Kibana Dashboard
Configuring Elasticsearch for multiple data shippers
Network configurations
Elasticsearch config file changes
Kibana config file changes
Winlogbeat config file changes
Winlogbeat logging configuration
Elastic security measures
Kibana Guide
Error Resolution
Page
3
3
4
4
4
4
5
6
7
8
9
10
10
10
12
Setting up the Virtual Machine on VirtualBox
1) Download the VirtualBox installer and complete the installation:
https://www.virtualbox.org/wiki/Downloads
2) Download the Window 2012 R2 Server ISO file:
https://www.microsoft.com/en-in/evalcenter/evaluate-windows-server-2012-r2
3) VM Installation steps
a. Click on the new button in VirtualBox, assign a name to the VM, and select Window
2012 from the Type dropdown menu. Click next.
b. Choose the amount of RAM to be allocated to the VM .
c. Select “Create a virtual hard disk now,” then proceed.
d. Select Virtual Hard Disk under Hard Disk File Type, then proceed.
e. Select dynamically allocated and then proceed.
f. Choose the amount of storage to be allocated to the VM and finally click create.
g. Go to the Storage section under the VM settings and click on the “empty” Disk image,
then insert the ISO file downloaded in Step 2 into the optical drive of the VM.
h. Start the VM.
Specifications of the VM used:
I.
II.
III.
2048 MB RAM
32 GB Hard Disk Memory
Host Machine Processor: Intel i7 9th Gen
Downloading the ELK Files and supplementary software
1) Download Google Chrome (any browser other than the Internet Explorer). Chrome makes
downloading the required files easier.
2) Download and install the latest version of Java SE as it’s an essential component for the
functioning of the Elastic Stack: https://www.oracle.com/java/technologies/javase-jdk16downloads.html
3) Download and Install the MS Visual Studio Code (System Installer 64bit/32 bit depending on
the system’s architecture)(Link: https://code.visualstudio.com/download ). Visual Studio
Code will be used to configure the yml files.
4) Download Elasticsearch for WINDOWS: https://www.elastic.co/downloads/elasticsearch
5) Download Kibana for WINDOWS: https://www.elastic.co/downloads/kibana
6) Download Winlogbeat for WINDOWS ZIP 64bit/32bit (depending on the architecture):
https://www.elastic.co/downloads/beats/winlogbeat
7) Ensure Elasticsearch, Kibana and Winlogbeat are of the same version.
8) Download WinRAR to unzip the files: https://www.win-rar.com/start.html?&L=0
9) Unzip all the folders into a single folder called “ELK,” see the image below
13
Installing Elasticsearch
1) Open the command prompt and move the current directory to the bin folder by executing
the command “cd C:\ELK\elasticsearch-7.13.3\bin” (the command might be different if the
installed path is different)
2) Execute the elasticsearch.bat file through the command prompt and let it run in the
background.
Elastic documentation for installing Elasticsearch
https://www.elastic.co/guide/en/elasticsearch/reference/current/zip-windows.html
Installing Kibana
1) Open the command prompt and move the current directory to the bin folder by executing
the command “cd C:\ELK\kibana-7.13.3-windows-x86_64\bin” (the command might be
different if the installed path is different)
2) Execute the kibana.bat file through the command prompt and let it run in the background.
Elastic documentation for installing Kibana
https://www.elastic.co/guide/en/kibana/current/windows.html
Installing Winlogbeat
1) Follow the winlogbeat documentation very carefully.
https://www.elastic.co/guide/en/beats/winlogbeat/7.13/winlogbeat-installationconfiguration.html
Accessing the Kibana Dashboard
Elasticsearch details can be accessed at http://localhost:9200/
Kibana Dashboard and Discover can be accessed at http://localhost:5601/
14
Configuring Elasticsearch for multiple data shippers
1) Set up another VM (refer to Setting up the Virtual Machine on VirtualBox).
2) Download and install winlogbeat on the new VM (refer to Installing the Winlogbeat).
Network configurations
The host VM (the VM with Elasticsearch and Kibana) and newly created VM, which will acts solely
as a data shipper, have to be on the local network.
a. Enter the command “ipconfig” in the command prompt and get the
i. IPv4 address
ii. Subnet mask
iii. Default Gateway
b. Open Network & Internet Settings → Change Adapter Settings → Ethernet (right-click)
→ properties → Internet Protocol Version 4 (TCP/IPv4) → Properties
c. Switch to “use the following IP address” and fill in the detail from step a.
d. Switch to “use the following DNS server addresses” and fill the Prefered DNS server with
the Default Gateway from step a.
Save all the settings and click okay.
Image for reference: Internet Protocol Version 4 Properties for the Host VM
15
e. Repeat the step b for the new VM (the data shipper VM). Put in the same Subnet mask,
Default Gateway and Preferred DNS server as the host VM. Put in an different IPv4
address such that the host VM and the data shipper VM lie on the same network.
Image for reference: Internet Protocol Version 4 Properties for the data shipper VM
Config file Changes
Make sure the elastic stack is not functioning before making the following changes to configuration
files on the host VM (the VM with Elasticsearch and Kibana):
Elasticsearch config file changes:
a. Go to the config folder in the elasticsearch folder and open the “elasticsearch.yml”
file with Visual Studio Code. In the default config yml file, everything should be
commented.
b. Paste the following section in the Network section
transport.host: localhost
transport.tcp.port: 9300
http.port: 9200
network.host: 0.0.0.0
16
c. Indentation is significant for the yml files to function appropriately, so make sure
the network section mentioned in step b looks like the image below:
Kibana config file changes:
a. Go to the config folder in the kibana folder and open the “kibana.yml” file with
Visual Studio Code. In the default config yml file, everything should be commented.
b. Uncomment the second line (see the image below)
c. Replace the seventh line with this (see the image below)
d. Replace the 32nd line with this (see the image below) (use the static IPv4 address
of the host VM set before, 10.0.2.15 in my case)
17
Winlogbeat config file changes:
a. Go to the Winlogbeat folder and open the “winlogbeat.yml” file.
b. Configure the winlogbeat specific options to resemble the image below.
If logs of different types are desired (like security, system), refer the image below
winlogbeat.event_logs:
- name: Application
- name: Security
- name: System
c. Under the Kibana section of the yml file, make the following changes (refer to the
image below) (Use the static IPv4 address of the host VM, 10.0.2.15 in my case)
18
d. Under the Elasticsearch Output section of the yml file, make the following changes
(refer to the image below) (Use the static IPv4 address of the host VM, 10.0.2.15 in
my case)
Winlogbeat logging configuration:
Winlogbeat generates logs if any errors are encountered. If records of these errors are desired,
make the following adjustments under the logging section of the “winlogbeat.yml” file. Enter the
desired path but ensure the indentation presented in the image.
Kibana now can be assessed at http://10.0.2.15:5601/ (host VM IPv4 address in my case is
10.0.2.15).
19
Elastic security measures
Follow the elastic documentation carefully to put minimal security measures in place.
https://www.elastic.co/guide/en/elasticsearch/reference/current/security-minimalsetup.html
Kibana Guide
Follow the Kibana documentation:
https://www.elastic.co/guide/en/kibana/7.13/index.html
Error Resolution
1) Error: Winlogbeat- Exiting: Error while initializing input: required 'object', but found
'string' in field
Solution: Recheck the identation of the code in the winlogbeat yml files
2) Error: Elasticsearch- org.elasticsearch.bootstrap.StartupException:
ElasticsearchException[X-Pack is not supported and Machine Learning is not available for
[windows-x86]
Solution: Put the entry xpack.enabled: false in elasticsearhc.yml file
3) Error: Kibana- Request to Elasticsearch failed: {"error":{}}
Solution: Change the parameter elasticsearch.requestTimeout: 500000 in
elasticsearch.yml file
4) Error: Elasticsearch - \Common was unexpected at this time.
Solution: Ensure all the versions of the compenents of the elastic stack are the same.
5) Error: Elasticsearch – JAVA_HOME environment variable must be set!
Solution: JAVA_HOME variable must be set as an environment variable if elasticsearch is
being installed as a service.
20
CONCLUSION
The experience so far with my mentors has been a pleasant one. Critical understanding has been
instilled regarding the Elastic ecosystem and its components and general know-how for testing
experimental software. ELK stack is an essential tool to master for any personnel in this field and
this venture has assisted me significantly.
21
REFERENCES
Elastic Theory and Documentation:
https://logz.io/learn/complete-guide-elk-stack/
https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
https://www.elastic.co/guide/en/kibana/7.13/index.html
https://www.elastic.co/guide/en/logstash/current/index.html
https://www.elastic.co/guide/en/beats/winlogbeat/current/index.html
PoC Development:
https://medium.com/@samil.mehdiyev/elk-stack-on-windows-server-part-2-installationd2a7200b65a6
https://elasticsearch.tutorials24x7.com/blog/how-to-install-elasticsearch-kibana-and-logstashelk-elastic-stack-on-windows
https://burnhamforensics.com/2018/11/18/sending-logs-to-elk-with-winlogbeat-and-sysmon/
GLOSSARY
ELK – Elasticsearch, Logstash, and Kibana
NAUT - NWOW Transformation Automation
CMD – Command Prompt
VM – Virtual Machine
22
Download