MIS5208-SP16 Week 11 Splunk Data

advertisement
MIS 5208
Week 11: Processing and Analyzing Data
Ed Ferrara, MSIA, CISSP
eferrara@temple.edu
Please signup for this training!


Temple, as a member school of Internet2, is entitled to free
training and certification exams for Splunk Power
Users: http://www.internet2.edu/blogs/detail/10079
If you register and take the courses described in the blog
posting yourself, you will have access to the teaching materials
in PDF form as part of the elearning course.
Reminder
Fox School of Business
Review Search

There are ____ components to the Search and Reporting
interface?




5
2
7
1
7
Fox School of Business
Review Search
Fox School of Business
Review Search

What is the most efficient way to filter events in Splunk?




By Time
By Host
With the admin user
In app
By Time
Fox School of Business
Review Search

When a search is run, events are returned in _________?




Chronological order
Pdf
Alphabetical order
Reverse chronological order
Reverse
chronological
order
Fox School of Business
Review Search

Commands that create statistics or visualizations are called
_________?




Transforming commands
Machine learning commands
Math
Data science
Transforming
Commands
Fox School of Business
Review Search

The search & reporting App has how many search modes
_________?




2
5
3
4
3
Fox School of Business
Review Search

Which character acts as a wildcard in Splunk __?




~
%
*
!
*
Fox School of Business
Review Search

What are Boolean operators in Splunk __?





AND
NOT
AFTER
OR
IF
AND
NOT
OR
Fox School of Business
Data Sources




A number of system applications and network devices such as routers switches relay events
over network ports using the TCP or UDP protocols.
Some applications make use of the SNMP standard to send events over UDP.
Syslog, which is a standard for computer data logging is another set of sources where there is
a wealth of information that could be captured at a network port level.
Splunk can be enabled to accept input from a TCP or UDP port.

Use the Splunk Web user interface and configure a network input source where you specify:






Host
Port
Sourcetype
Once you save the configuration, Splunk will start indexing the data coming out of the
specified network port.
This kind of network input can be used to capture syslog information that gets generated on
remote machines and the data does not reside locally to a Splunk instance.
Splunk forwarders can also be used to gather data on remote hosts.
Fox School of Business
Windows Data



The Windows operating system churns out a number of log files that have
information about:
 Windows events
 Registry
 Active Directory
 WMI
 Performance
 Other data.
Splunk recognizes Windows log streams as a source type and allows adding one
more of these log streams to be indexed as input for further processing.
Although Windows sources such as Active Directory or others can be individually
configured, Splunk provides a better and easy way of dealing with these Windows
logs or events by using:
 Splunk App for Windows
 Splunk Technology Add-on for Windows
Fox School of Business
Windows Technology Add-On On Linux



Note: Windows Technology
add-on can be installed on
Splunk running on Windows.
If you are running Splunk on
Linux then the Windows TA
can be installed on a
forwarder running on a
Windows machine.
Forwarders are explained
later in this chapter and in
Chapter 15 of the text.
Fox School of Business



Splunk also provides similar
Technology Add-ons for Linux and
Unix known as *Nix.
This Add-on makes use of both log
files and scripting to get different sets
of event and log data available in
Linux or Unix into Splunk.
You can install *Nix technology on
Linux systems
Fox School of Business
Other Apps
Fox School of Business
Getting to Know Combined Access Log Data




In the real world, enterprises have numerous applications and most of them will be
running on a heterogeneous infrastructure, which includes all sorts of hardware,
databases, middleware, and application programs.
It will not be possible to have Splunk running locally or near to each of the
applications or infrastructure, meaning the data will not be local to Splunk.
What we have seen in this chapter is how we can get data into Splunk which is local
to it.
The use cases assumed that Splunk will be able to access files or directories, which
could be on local or file systems that have remote data, but they are attached to
the machine where Splunk is running.
A Splunk forwarder is the same as a standard Splunk instance but with
only the essential components that are required to forward data to
receivers, which could be the main Splunk instance or indexer.
Fox School of Business
Processing and Analyzing Data
Processing and Analyzing Data

Requirement



Understand the data set that you want to process and analyze.
Get intimately acquainted with the data you will work with first.
Review

Log files are generated by almost all kinds of applications and
servers:




End-user applications
Web servers
Complex middleware platforms
Operating systems and firmware also generate huge amounts of
raw data into log files.
The challenge lies in understanding, analyzing, and
mining the raw data in the log files and making sense
out of it.
Fox School of Business
Preview Data
127.0.0.1 - JohnDoe [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "http://www.google.com" "Opera/9.20
(Windows NT 6.0; U; en)"
127.0.0.1
This is the IP address of the client (the machine, host, or proxy server)
that was making an HTTP request to access either a web application
or an individual web page. The value in the field could be represented
as hostname.
-
This field is used to identify the client making the HTTP request. The
contents of this field are highly unreliable, a hyphen is typically used,
which indicates the information is not available.
JohnDoe
This is the user id of the user who is requesting the web page or an
application.
10/Jan/2013:10:32:55 -0800
The timestamp of when the server finished processing the request.
The format can be controlled using web server settings.
“GET /apache_pb.gif HTTP/1.0”
This is the request line that is received from the client. It shows the
method information, in this example GET, the resource that the client
was requesting, in this case /apache_pb.gif, and the protocol used, in
this case HTTP/1.0
200
This is the status code that the server sends back to the client. Status
codes are very important information as they tell whether the
request from the client was successfully fulfilled or failed, in which
case some action needs to be taken. 200 in this case indicates that
the request has been successful.
2326
This number indicates the size of the data returned to the client. In
this case 2326 bytes were sent back to the client. If no content was
returned to the client, this value will be a hyphen.“
Fox School of Business
Preview Data
127.0.0.1 - JohnDoe [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "http://www.google.com" "Opera/9.20
(Windows NT 6.0; U; en)"
“http://www.google.com”
This field is known as a referrer field and shows from where the
request has been referred. You could be seeing web site URLs like
http://www.google.com, http://www.yahoo.com, or
http://www.bing.com as the values in the referrer field. Referrer
information helps web sites or online applications to see how the
users are coming in to the web site and this information could be
used to determine where the online advertisement dollars should be
spent. As you may notice that referrer has an extra “r”. That is
intentional and originated from the original proposal submitted in the
HTTP specification. In browsers like Chrome where users can use
incognito mode, or have referrers disabled, the values in the field
will not be accurate. In HTML5 the user agent that is reporting this
information can be instructed not to send the referrer information.
“Opera/9.20 (Windows NT 6.0; U; en)”
This is the user-agent field, and it has the information that the client
browser reports about itself. You will see values like “Opera/9.20
(Windows NT 6.0; U; en)”, which means that the request is coming
from an Opera browser running on a Windows NT (actually Windows
Vista or Windows Server 2008) operating system. User-agent
information helps to optimize web sites and web applications and
cater for requests coming from smaller form factor devices such as
the iPad and mobile phones.
Fox School of Business
Look at some of the data…
Fox School of Business
Lab 4
Load the Data
Fox School of Business
Load the Data
Fox School of Business
Load the Data
Fox School of Business
Load the Data
Fox School of Business
Load the Data
Fox School of Business
Load the Data
Fox School of Business
Load the Data
Fox School of Business
Load the Data
Fox School of Business
Load the Data
Fox School of Business
Load the Data
Fox School of Business
Load the Data
Fox School of Business
Search the Data
Fox School of Business
Chapter 3
List All Fields
Fox School of Business
List All Fields
Fox School of Business
List All Fields
Fox School of Business
List All Fields
Fox School of Business
List All Fields - CategoryID
Fox School of Business
List All Fields – date_hour
Fox School of Business
List All Fields – Time Selection
Fox School of Business
List All Fields – Average Over Time
Fox School of Business
Thank you
Download