Uploaded by Vishnu Priya

splunk interview question

advertisement
2. What is Splunk?
Splunk is ‘Google’ for our machine-generated data. It’s a software/engine that can be used for
searching, visualizing, monitoring, reporting, etc. of our enterprise data. Splunk takes valuable machine
data and turns it into powerful operational intelligence by providing real-time insights into our data
through charts, alerts, reports, etc.
3. What are the common port numbers used by Splunk?
Below are the common port numbers used by Splunk. However, we can change them if required.
Service
Port Number Used
Splunk Web port
8000
Splunk Management port
8089
Splunk Indexing port
9997
Splunk Index Replication port 8080
Splunk Network port
514 (Used to get data from the Network port, i.e., UDP data)
KV Store
8191
4. What are the components of Splunk? Explain Splunk architecture.
This is one of the most frequently asked Splunk interview questions. Below are the components of
Splunk:

Search Head: Provides the GUI for searching

Indexer: Indexes the machine data

Forwarder: Forwards logs to the Indexer

Deployment Server: Manges Splunk components in a distributed environment
5. Which is the latest Splunk version in use?
Splunk 8.2.1 (as of June 21, 2021)
6. What is Splunk Indexer? What are the stages of Splunk Indexing?
Splunk Indexer is the Splunk Enterprise component that creates and manages indexes. The primary
functions of an indexer are:

Indexing incoming data

Searching the indexed data

Picture
7. What is a Splunk Forwarder? What are the types of Splunk
Forwarders?
There are two types of Splunk Forwarders as below:

Universal Forwarder (UF): The Splunk agent installed on a non-Splunk system to gather
data locally; it can’t parse or index data.

Heavyweight Forwarder (HWF): A full instance of Splunk with advanced functionalities.
It generally works as a remote collector, intermediate forwarder, and possible data filter, and since it
parses data, it is not recommended for production systems.
8. Can you name a few most important configuration files in Splunk?

props.conf

indexes.conf

inputs.conf

transforms.conf

server.conf
9. What are the types of Splunk Licenses?

Enterprise license

Free license

Forwarder license

Beta license

Licenses for search heads (for distributed search)

Licenses for cluster members (for index replication)
10. What is Splunk App?
Splunk app is a container/directory of configurations, searches, dashboards, etc. in Splunk.
11. Where is Splunk Default Configuration stored?
$splunkhome/etc/system/default
12. What are the features not available in Splunk Free?
Splunk Free does not include below features:

Authentication and scheduled searches/alerting

Distributed search

Forwarding in TCP/HTTP (to non-Splunk)

Deployment management
13. What happens if the License Master is unreachable?
If the license master is not available, the license slave will start a 24-hour timer, after which the search
will be blocked on the license slave (though indexing continues). However, users will not be able to
search for data in that slave until it can reach the license master again.
14. What is Summary Index in Splunk?
A summary index is the default Splunk index (the index that Splunk Enterprise uses if we do not indicate
another one).
If we plan to run a variety of summary index reports, we may need to create additional summary
indexes.
15. What is Splunk DB Connect?
Splunk DB Connect is a generic SQL database plugin for Splunk that allows us to easily integrate
database information with Splunk queries and reports.
16. Can you write down a general regular expression for extracting
the IP address from logs?
There are multiple ways in which we can extract the IP address from logs. Below are a few examples:
By using a regular expression:
rex field=_raw "(?<ip_address>\d+\.\d+\.\d+\.\d+)"
OR
rex field=_raw
"(?<ip_address>([0-9]{1,3}[\.]){3}[0-9]{1,3})"
17. Explain Stats vs Transaction commands.
This is another frequently asked interview question on Splunk which will test Developer or Engineers
knowledge. The transaction command is the most useful in two specific cases:

When the unique ID (from one or more fields) alone is not sufficient to discriminate
between two transactions. This is the case when the identifier is reused, for example, web
sessions identified by a cookie/client IP. In this case, the time span or pauses are also
used to segment the data into transactions.

When an identifier is reused, say in DHCP logs, a particular message identifies the
beginning or end of a transaction.

When it is desirable to see the raw text of events combined rather than an analysis of the
constituent fields of the events.

As the performance of the stats command is higher, it can be used especially in a
distributed search environment
If there is a unique ID, the stats command can be used
18. How to troubleshoot Splunk performance issues?
The answer to this question would be very wide, but mostly an interviewer would be looking for the
following keywords:

Check splunkd.log for errors

Check server performance issues, i.e., CPU, memory usage, disk I/O, etc.

Install the SOS (Splunk on Splunk) app and check for warnings and errors in its dashboard

Check the number of saved searches currently running and their consumption of system
resources

Install and enable Firebug, a Firefox extension. Log into Splunk (using Firefox) and open
Firebug’s panels. Then, switch to the ‘Net’ panel (we will have to enable it). The Net panel
will show us the HTTP requests and responses, along with the time spent in each. This will
give us a lot of information quickly such as which requests are hanging Splunk, which
requests are blameless, etc.
19. What are Buckets? Explain Splunk Bucket Lifecycle.
Splunk places indexed data in directories, called ‘buckets.’ It is physically a directory containing events
of a certain period.
A bucket moves through several stages as it ages. Below are the various stages it goes through:

Hot: A hot bucket contains newly indexed data. It is open for writing. There can be one or
more hot buckets for each index.

Warm: A warm bucket consists of data rolled out from a hot bucket. There are many warm
buckets.

Cold: A cold bucket has data that is rolled out from a warm bucket. There are many cold
buckets.

Frozen: A frozen bucket is comprised of data rolled out from a cold bucket. The indexer
deletes frozen data by default, but we can archive it. Archived data can later be thawed
(data in a frozen bucket is not searchable).
By default, the buckets are located in:
$SPLUNK_HOME/var/lib/splunk/defaultdb/db
We should see the hot-db there, and any warm buckets we have. By default, Splunk sets the bucket
size to 10 GB for 64-bit systems and 750 MB for 32-bit systems.
20. What is the difference between stats and eventstats commands?

The stats command generates summary statistics of all the existing fields in the search
results and saves them as values in new fields.

Eventstats is similar to the stats command, except that the aggregation results are added
inline to each event and only if the aggregation is pertinent to that event. The eventstats
command computes requested statistics, like stats does, but aggregates them to the
original raw data.
21. Who are the top direct competitors to Splunk?
Logstash, Loggly, LogLogic, Sumo Logic, etc. are some of the top direct competitors to Splunk.
22. What do Splunk Licenses specify?
Splunk licenses specify how much data we can index per calendar day.
23. How does Splunk determine 1 day, from a licensing perspective?
In terms of licensing, for Splunk, 1 day is from midnight to midnight on the clock of the license master.
24. How are Forwarder Licenses purchased?
They are included with Splunk. Therefore, no need to purchase separately.
25. What is the command for restarting Splunk web server?
This is another frequently asked Splunk commands interview question. Get a thorough idea of
commands We can restart the Splunk web server by using the following command:
splunk start splunkweb
26. What is the command for restarting Splunk Daemon?
Splunk Deamon can be restarted with the below command:
splunk start splunkd
27. What is the command used to check the running Splunk
processes on Unix/Linux?
If we want to check the running Splunk Enterprise processes on Unix/Linux, we can make use of the
following command:
ps aux | grep splunk
28. What is the command used for enabling Splunk to boot start?
To boot start Splunk, we have to use the following command:
$SPLUNK_HOME/bin/splunk enable boot-start
29. How to disable Splunk boot-start?
In order to disable Splunk boot-start, we can use the following:
$SPLUNK_HOME/bin/splunk disable boot-start
Learn the complete concepts of Splunk from Intellipaat’s Splunk Training at Hyderabad in just
26 hours!
30. What is Source Type in Splunk?
The source type is Splunk way of identifying data.
31. How to reset Splunk Admin password?
Resetting the Splunk Admin password depends on the version of Splunk. If we are using Splunk 7.1
and above, then we have to follow the below steps:

First, we have to stop our Splunk Enterprise

Now, we need to find the ‘passwd’ file and rename it to ‘passwd.bk’

Then, we have to create a file named ‘user-seed.conf’ in the below directory:
$SPLUNK_HOME/etc/system/local/
In the file, we will have to use the following command (here, in the place of ‘NEW_PASSWORD’, we
will add our own new password):
[user_info]
PASSWORD = NEW_PASSWORD

After that, we can just restart the Splunk Enterprise and use the new password to log in
Now, if we are using the versions prior to 7.1, we will follow the below steps:

First, stop the Splunk Enterprise

Find the passwd file and rename it to ‘passw.bk’

Start Splunk Enterprise and log in using the default credentials of admin/changeme

Here, when asked to enter a new password for our admin account, we will follow the
instructions
Note: In case we have created other users earlier and know their login details, copy and paste their
credentials from the passwd.bk file into the passwd file and restart Splunk.
32. How to disable Splunk Launch Message?
Set value OFFENSIVE=Less in splunk_launch.conf
Learn more from Intellipaat’s insightful Splunk Tutorial!
33. How to clear Splunk Search History?
We can clear Splunk search history by deleting the following file from the Splunk server:
$splunk_home/var/log/splunk/searches.log
34. What is Btool? How will you troubleshoot Splunk configuration
files?
Splunk Btool is a command-line tool that helps us troubleshoot configuration file issues or just see what
values are being used by our Splunk Enterprise installation in the existing environment.
35. What is the difference between Splunk App and Splunk Add-on?
In fact, both contain preconfigured configuration, reports, etc., but the Splunk add-on does not have a
visual app. On the other hand, a Splunk app has a preconfigured visual app.
36. What is .conf files precedence in Splunk?
File precedence is as follows:
System local directory — highest priority
App local directories
App default directories
System default directory — lowest priority
37. What is Fishbucket? What is Fishbucket Index?
Fishbucket is a directory or index at the default location:
/opt/splunk/var/lib/splunk
It contains seek pointers and CRCs for the files we are indexing, so ‘splunkd’ can tell us if it has read
them already. We can access it through the GUI by searching for:
index=_thefishbucket
38. How do I exclude some events from being indexed by Splunk?
This can be done by defining a regex to match the necessary event(s) and sending everything else to
NullQueue. Here is a basic example that will drop everything except events that contain the string login:
In props.conf:
<code>[source::/var/log/foo]
# Transforms must be applied in this order
# to make sure events are dropped on the
# floor prior to making their way to the
# index processor
TRANSFORMS-set= setnull,setparsing
</code>
In transforms.conf:
[setnull] REGEX = . DEST_KEY = queue FORMAT = nullQueue
[setparsing]
REGEX = login
DEST_KEY = queue
FORMAT = indexQueue
39. How can I understand when Splunk has finished indexing a log
file?
We
can
figure
this
out:
By watching data from Splunk’s metrics log in real-time:
index="_internal" source="*metrics.log" group="per_sourcetype_thruput"
series="<your_sourcetype_here>" |
eval MB=kb/1024 | chart sum(MB)
By watching everything split by source type:
index="_internal" source="*metrics.log" group="per_sourcetype_thruput" | eval
MB=kb/1024 | chart sum(MB) avg(eps) over series
If we are having trouble with data input and we want a way to troubleshoot it, particularly if our
whitelist/blacklist rules are not working the way we expected, we will go to the following URL:
https://yoursplunkhost:8089/services/admin/inputstatus
40. How to set the default search time in Splunk 6?
To do this in Splunk Enterprise 6.0, we have to use ‘ui-prefs.conf’. If we set the value in the following,
all our users would see it as the default setting:
$SPLUNK_HOME/etc/system/local
For example, if our
$SPLUNK_HOME/etc/system/local/ui-prefs.conf file
includes:
[search]
dispatch.earliest_time = @d
dispatch.latest_time = now
The default time range that all users will see in the search app will be today.
The configuration file reference for ui-prefs.conf is here:
http://docs.splunk.com/Documentation/Splunk/latest/Admin/Ui-prefsconf
41. What is Dispatch Directory?
$SPLUNK_HOME/var/run/splunk/dispatch
contains a directory for each search that is running or has completed. For example, a directory named
1434308943.358 will contain a CSV file of its search results, a search.log with details about the search
execution, and other stuff. Using the defaults (which we can override in limits.conf), these directories
will be deleted 10 minutes after the search completes—unless the user saves the search results, in
which case the results will be deleted after 7 days.
42. What is the difference between Search Head Pooling and Search
Head Clustering?
Both are features provided by Splunk for the high availability of Splunk search head in case any search
head goes down. However, the search head cluster is newly introduced and search head pooling will
be removed in the next upcoming versions.
The search head cluster is managed by a captain, and the captain controls its slaves. The search head
cluster is more reliable and efficient than the search head pooling.
43. If I want to add folder access logs from a windows machine to
Splunk, how do I do it?
Below are the steps to add folder access logs to Splunk:
1. Enable Object Access Audit through group policy on the Windows machine on which the
folder is located
2. Enable auditing on a specific folder for which we want to monitor logs
3. Install Splunk universal forwarder on the Windows machine
4. Configure universal forwarder to send security logs to Splunk indexer
44. How would you handle/troubleshoot Splunk License Violation
Warning?
A license violation warning means that Splunk has indexed more data than our purchased license
quota. We have to identify which index/source type has received more data recently than the usual
daily data volume. We can check the Splunk license master pool-wise available quota and identify the
pool for which the violation has occurred. Once we know the pool for which we are receiving more data,
then we have to identify the top source type for which we are receiving more data than the usual data.
Once the source type is identified, then we have to find out the source machine which is sending the
huge number of logs and the root cause for the same and troubleshoot it, accordingly.
45. What is MapReduce algorithm?
MapReduce algorithm is the secret behind Splunk’s faster data searching. It’s an algorithm typically
used for batch-based large-scale parallelization. It’s inspired by functional programming’s map() and
reduce() functions.
46. How does Splunk avoid the duplicate indexing of logs?
At the indexer, Splunk keeps track of the indexed events in a directory called fishbucket with the default
location:
/opt/splunk/var/lib/splunk
It contains seek pointers and CRCs for the files we are indexing, so splunkd can tell us if it has read
them already.
See more at:
http://www.learnsplunk.com/splunk-indexerconfiguration.html#sthash.t1ixi19P.dpuf.
47. What is the difference between Splunk SDK and Splunk
Framework?
Splunk SDKs are designed to allow us to develop applications from scratch and they do not require
Splunk Web or any components from the Splunk App Framework. These are separately licensed from
Splunk and do not alter the Splunk Software.
Splunk App Framework resides within the Splunk web server and permits us to customize the Splunk
Web UI that comes with the product and develop Splunk apps using the Splunk web server. It is an
important part of the features and functionalities of Splunk, which does not license users to modify
anything in Splunk.
48. For what purpose inputlookup and outputlookup are used in
Splunk Search?
The inputlookup command is used to search the contents of a Splunk lookup table. The lookup table
can be a CSV lookup or a KV store lookup. The inputlookup command is considered to be an eventgenerating command. An event-generating command generates events or reports from one or more
indexes without transforming them. There are many commands that come under the event-generating
commands such as metadata, loadjob, inputcsv, etc. The inputlookup command is one of them.
Syntax:
inputlookup [append=] [start=] [max=] [ | ] [WHERE ]
Now coming to the outputlookup command, it writes the search results to a static lookup table, or KV
store collection, that we specify. The outputlookup command is not being used with external lookups.
Syntax:
outputlookup [append=<bool>] [create_empty=<bool>] [max=<int>]
[key_field=<field_na
Splunk Admin Interview Questions
49. Explain how Splunk works?
We can divide the working of Splunk into three main parts:

Forwarder: You can see it as a dumb agent whose main task is to collect the data from various
sources like remote machines and transfer it to the indexer.

Indexer: The indexer will then process the data in real-time and store & index it on the localhost or
cloud server.

Search Head: It allows the end-user to interact with the data and perform various operations like
searching, analyzing, and visualizing the information.
50. How to add the colors in Splunk UI based on the field names?
Splunk UI has a number of features that allow the administrator to make the reports more presentable.
One such feature that proves to be very useful for presenting distinguished results is the custom colors.
For example, if the sales of a product drop below a threshold value, then as an administrator you can
set the chart to display the values in red color.
The administrator can also change chart colors in the Splunk Web UI by editing the panels from the
panel settings mentioned above the dashboard. Moreover, you can write the codes and use
hexadecimal values to choose a color from the palette.
51. How the Data Ages in Splunk?
Data entering in an indexer gets directories, also known as buckets. Over a period of time, these
buckets roll over different stages from hot to warm, cold, frozen, and finally thawed. The indexer goes
through a pipeline and this is where the event processing takes place. It occurs in two stages, Parsing
breaks the in individual events, while indexing takes these events into the pipeline for the processing.
This is what happens to the data at each stage of the indexing pipeline:

As soon as the data center the pipeline, it goes to the hot bucket. There can be multiple
hot buckets at any point in time, which you can both search and write to.

If any problem like the Splunk getting restarted or the hot bucket has reached a certain
threshold value/size, then a new bucket will be created in its place and the existing ones
roll to become a warm bucket. These warm buckets are searchable, but you cannot write
anything in them.

Further, if the indexer reaches its maximum capacity, the warm bucket will be rolled to
become a cold one. Splunk will automatically execute the process by selecting the oldest
warm bucket from the pipeline. However, it doesn’t rename the bucket. All the above
buckets will be stored in the default
location ‘$SPLUNK_HOME/var/lib/splunk/defaultdb/db/*’.

After a certain period of time, the cold bucket rolls to become the frozen bucket. These buckets
don’t have the same location as the previous buckets and are non-searchable. These buckets
can either be archived or deleted based on the priorities.

You can’t do anything if the bucket is deleted, but you can retrieve the frozen bucket if it’s
being archived. The process of retrieving an archived bucket is known as thawing. Once a
bucket is thawed it becomes searchable and stores into a new
location ‘$SPLUNK_HOME/var/lib/splunk/defaultdb/thaweddb/’.
52. What are pivots and data models in Splunk?
Data models in Splunk are used when you have to process huge amounts of unstructured data and
create a hierarchical model without executing complex search queries on the data. Data models are
widely used for creating sales reports, adding access levels, and creating a structure of authentication
for various applications.
Pivots, on the other hand, give you the flexibility to create multiple views and see the results as per the
requirements. With pivots, even the managers of stakeholders from non-technical backgrounds can
create views and get more details about their departments.
53. Explain Workflow Actions?
This topic will be present in any set of Splunk interview questions and answers. Workflow actions in
Splunk are referred to as highly configurable, knowledge objects that enable you to interact with web
resources and other fields. Splunk workflow actions can be used to create HTML links and use them to
search field values, put HTTP post requests for specific URLs, and run secondary searches for selected
events.
54. How many types of dashboards are available in Splunk?
There are three types of dashboards available in Splunk:

Real-time dashboards

Dynamic form-based dashboards

Dashboards for scheduled reports
55. What are the types of alerts available in Splunk?
Alerts are the actions generated by a saved search result after a certain period of time. Once an alert
has occurred, subsequent actions like email or message will also be triggered. There two
Types of alters available in Splunk:

Real-time alerts: we can divide the real-time alerts into two parts, pre-result, and rollingwindow alerts. The pre-result alert gets triggered with every search, while rolling-window
alerts are triggered when a specific criterion is met by the search.

Scheduled Alerts: As the name suggests, scheduled alerts can be initialized to trigger
multiple alerts based on the set criteria.
56. Define the term “Search factor” and “Replication factor”
Search factor: The search factor (SF) decides the number of searchable copies an indexer cluster can
maintain of the data/bucket. For example, the search factor value of 3 shows that the cluster can
maintain up to 3 copies of each bucket.
Replication factor: The replication factor (RF) determines the number of users that can receive copies
of your data/buckets. However, the search factor should not be greater than the replication factor.
57. How to stop/start the Splunk service?
The command for starting Splunk service:
./splunk start
The command for stopping Splunk service:
./splunk stop
58. What is the use of Time Zone property in Splunk?
Time Zone is an important property that helps you search for the events in case any fraud or security
issue occurs. The default time zone will be taken from the browser settings or the machine you are
using. Apart from event searching, it is also used in data pouring from multiple sources and aligns them
based on different time zones.
59. What are the important Search commands in Splunk?
Below are some of the important search commands in Splunk:

Erex

Abstract

Typer

Rename

Anomalies

Fill down

Accum

Add totals
60. How many types of search modes are there in Splunk?
There are three types of search modes in Splunk:

Fast mode: speeds up your search result by limiting the types of data.

Verbose mode: Slower as compared to the fast mode, but returns the information for as
many events as possible.

Smart mode: It toggles between different modes and search behaviors to provide
maximum results in the shortest period of time.
Download