MIS 5208 Week 11: Processing and Analyzing Data Ed Ferrara, MSIA, CISSP eferrara@temple.edu Please signup for this training! Temple, as a member school of Internet2, is entitled to free training and certification exams for Splunk Power Users: http://www.internet2.edu/blogs/detail/10079 If you register and take the courses described in the blog posting yourself, you will have access to the teaching materials in PDF form as part of the elearning course. Reminder Fox School of Business Review Search There are ____ components to the Search and Reporting interface? 5 2 7 1 7 Fox School of Business Review Search Fox School of Business Review Search What is the most efficient way to filter events in Splunk? By Time By Host With the admin user In app By Time Fox School of Business Review Search When a search is run, events are returned in _________? Chronological order Pdf Alphabetical order Reverse chronological order Reverse chronological order Fox School of Business Review Search Commands that create statistics or visualizations are called _________? Transforming commands Machine learning commands Math Data science Transforming Commands Fox School of Business Review Search The search & reporting App has how many search modes _________? 2 5 3 4 3 Fox School of Business Review Search Which character acts as a wildcard in Splunk __? ~ % * ! * Fox School of Business Review Search What are Boolean operators in Splunk __? AND NOT AFTER OR IF AND NOT OR Fox School of Business Data Sources A number of system applications and network devices such as routers switches relay events over network ports using the TCP or UDP protocols. Some applications make use of the SNMP standard to send events over UDP. Syslog, which is a standard for computer data logging is another set of sources where there is a wealth of information that could be captured at a network port level. Splunk can be enabled to accept input from a TCP or UDP port. Use the Splunk Web user interface and configure a network input source where you specify: Host Port Sourcetype Once you save the configuration, Splunk will start indexing the data coming out of the specified network port. This kind of network input can be used to capture syslog information that gets generated on remote machines and the data does not reside locally to a Splunk instance. Splunk forwarders can also be used to gather data on remote hosts. Fox School of Business Windows Data The Windows operating system churns out a number of log files that have information about: Windows events Registry Active Directory WMI Performance Other data. Splunk recognizes Windows log streams as a source type and allows adding one more of these log streams to be indexed as input for further processing. Although Windows sources such as Active Directory or others can be individually configured, Splunk provides a better and easy way of dealing with these Windows logs or events by using: Splunk App for Windows Splunk Technology Add-on for Windows Fox School of Business Windows Technology Add-On On Linux Note: Windows Technology add-on can be installed on Splunk running on Windows. If you are running Splunk on Linux then the Windows TA can be installed on a forwarder running on a Windows machine. Forwarders are explained later in this chapter and in Chapter 15 of the text. Fox School of Business Splunk also provides similar Technology Add-ons for Linux and Unix known as *Nix. This Add-on makes use of both log files and scripting to get different sets of event and log data available in Linux or Unix into Splunk. You can install *Nix technology on Linux systems Fox School of Business Other Apps Fox School of Business Getting to Know Combined Access Log Data In the real world, enterprises have numerous applications and most of them will be running on a heterogeneous infrastructure, which includes all sorts of hardware, databases, middleware, and application programs. It will not be possible to have Splunk running locally or near to each of the applications or infrastructure, meaning the data will not be local to Splunk. What we have seen in this chapter is how we can get data into Splunk which is local to it. The use cases assumed that Splunk will be able to access files or directories, which could be on local or file systems that have remote data, but they are attached to the machine where Splunk is running. A Splunk forwarder is the same as a standard Splunk instance but with only the essential components that are required to forward data to receivers, which could be the main Splunk instance or indexer. Fox School of Business Processing and Analyzing Data Processing and Analyzing Data Requirement Understand the data set that you want to process and analyze. Get intimately acquainted with the data you will work with first. Review Log files are generated by almost all kinds of applications and servers: End-user applications Web servers Complex middleware platforms Operating systems and firmware also generate huge amounts of raw data into log files. The challenge lies in understanding, analyzing, and mining the raw data in the log files and making sense out of it. Fox School of Business Preview Data 127.0.0.1 - JohnDoe [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "http://www.google.com" "Opera/9.20 (Windows NT 6.0; U; en)" 127.0.0.1 This is the IP address of the client (the machine, host, or proxy server) that was making an HTTP request to access either a web application or an individual web page. The value in the field could be represented as hostname. - This field is used to identify the client making the HTTP request. The contents of this field are highly unreliable, a hyphen is typically used, which indicates the information is not available. JohnDoe This is the user id of the user who is requesting the web page or an application. 10/Jan/2013:10:32:55 -0800 The timestamp of when the server finished processing the request. The format can be controlled using web server settings. “GET /apache_pb.gif HTTP/1.0” This is the request line that is received from the client. It shows the method information, in this example GET, the resource that the client was requesting, in this case /apache_pb.gif, and the protocol used, in this case HTTP/1.0 200 This is the status code that the server sends back to the client. Status codes are very important information as they tell whether the request from the client was successfully fulfilled or failed, in which case some action needs to be taken. 200 in this case indicates that the request has been successful. 2326 This number indicates the size of the data returned to the client. In this case 2326 bytes were sent back to the client. If no content was returned to the client, this value will be a hyphen.“ Fox School of Business Preview Data 127.0.0.1 - JohnDoe [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "http://www.google.com" "Opera/9.20 (Windows NT 6.0; U; en)" “http://www.google.com” This field is known as a referrer field and shows from where the request has been referred. You could be seeing web site URLs like http://www.google.com, http://www.yahoo.com, or http://www.bing.com as the values in the referrer field. Referrer information helps web sites or online applications to see how the users are coming in to the web site and this information could be used to determine where the online advertisement dollars should be spent. As you may notice that referrer has an extra “r”. That is intentional and originated from the original proposal submitted in the HTTP specification. In browsers like Chrome where users can use incognito mode, or have referrers disabled, the values in the field will not be accurate. In HTML5 the user agent that is reporting this information can be instructed not to send the referrer information. “Opera/9.20 (Windows NT 6.0; U; en)” This is the user-agent field, and it has the information that the client browser reports about itself. You will see values like “Opera/9.20 (Windows NT 6.0; U; en)”, which means that the request is coming from an Opera browser running on a Windows NT (actually Windows Vista or Windows Server 2008) operating system. User-agent information helps to optimize web sites and web applications and cater for requests coming from smaller form factor devices such as the iPad and mobile phones. Fox School of Business Look at some of the data… Fox School of Business Lab 4 Load the Data Fox School of Business Load the Data Fox School of Business Load the Data Fox School of Business Load the Data Fox School of Business Load the Data Fox School of Business Load the Data Fox School of Business Load the Data Fox School of Business Load the Data Fox School of Business Load the Data Fox School of Business Load the Data Fox School of Business Load the Data Fox School of Business Search the Data Fox School of Business Chapter 3 List All Fields Fox School of Business List All Fields Fox School of Business List All Fields Fox School of Business List All Fields Fox School of Business List All Fields - CategoryID Fox School of Business List All Fields – date_hour Fox School of Business List All Fields – Time Selection Fox School of Business List All Fields – Average Over Time Fox School of Business Thank you