SearchCheatSheet This is the "cook book" cheat sheet containing tips for using Splunk's powerful search language. If you want a manual of search commands, check out the Splunk Docs for Search Commands. Filtering Results keep only those results that have the required src or dst values * | search src="10.9.165.*" OR dst="10.9.165.8" keep only results whose _raw field contains ip addresses in the non-routable class A (10.0.0.0/8) * | regex _raw=(?<!\d)10.\d{1,3}\.\d{1,3}\.\d{1,3}(?!\d) remove duplicates of results with the same host * | dedup host Ordering Results sort by IP ascending and then URL descending. * | sort ip, -url reverse the order of the results. * | reverse return the first 20 results. * | head 20 return the last 20 results, in reverse order. * | tail 20 Filtering and Ordering Results keep only the host and ip attributes, setting the order of attributes: host first, ip, second * | fields host, ip same as above, but removes all interal attributes as well (e.g. time), which may cause UI problems. * | fields + host, ip remove the host and ip attributes but leaves all others untouched. * | fields - host, ip Extracting or Adding Attributes extract attribute/value pairs while reloading settings from disk. * | extract reload=true extract attribute/value pairs that are delimited by "|;", and attribute/values that are delimited by "=:". * | extract pairdelim="|;", kvdelim="=:", auto=f extract attribute/value pairs from xml, with the attribute set to the xml tag, and the value to the value between the xml tags * | xmlkv extract the COMMAND field only when it occurs in rows that contain "splunkd". * | multikv fields COMMAND filter splunkd extract 'from' and 'to' fields using regular expressions. If raw was "From: Susan To: Bob", then from=Susan and to=Bob. * | rex field=_raw "From: (?<from>.*) To: (?<to>.*)" add a new attribute 'comboIP', which is sourceIP + "/" + destIP * | strcat sourceIP "/" destIP comboIP add a new attribute 'velocity' equal to distance / time, by calling SQLite. * | eval velocity=distance/time add ip="10.10.10.10" and foo="foo bar" to each result * | setfields ip="10.10.10.10", foo="foo bar" add location information to the first twenty 404 errors on the host webserver1, based on the IP addresses. 404 host=webserver1 | head 20 | iplocation Converting or Changing the Names, Units, or Datatypes of Attributes convert every field (that doesn't start with an '') except for the field 'foo'. None tells convert to ignore a field. * | convert auto(*) none(foo) change all memory amounts into kilobytes. A number by itself specifies kB; numbers with 'm', MB; and numbers with 'g', GB. * | convert memk(virt) changes the sendmail syslog duration format of [D+HH:MM:SS] to seconds, e.g. '00:10:15' -> '615' for the xdelay field. * | convert dur2sec(delay) convert the duration into a number, removing later string, i.e. '212 sec' -> '212'. * | convert rmunit(duration) rename the ip field as IPAddress. * | rename _ip as IPAddress replaces any hosts ending with "localhost" to just be "localhost". * | replace *localhost with localhost in host Reporting and Statistical Graphing Functions return the least common values of the url field. * | rare url return the 20 most common URLs. * | top limit=20 url remove duplicates of results with the same host value and return the total count of the remaining results. * | stats dc(host) for each hour, return the average of any unique field that ends with the the string 'lay' (e.g. delay, xdelay, relay, etc). * | stats avg(*lay) BY date_hour search the access logs, and return the number of hits from the top 100 referer domains. sourcetype=access_combined | top limit=100 referer_domain | stats sum(count) search the access logs, and return the results associated with each other, having at least 3 references to each other. sourcetype=access_combined | associate supcnt=3 get the average (mean) size for each distinct host. * | chart avg(size) by host get the max delay by size, where size is broken down into up to 10 equal sized buckets. * | chart max(delay) by size bins=10 graph the average thruput over time, with time sets differentiated by 5 minute spans. * | timechart span=5m avg(thruput) by host create a timechart, average the cpu_seconds by host, then truncate outlying values to remove data that may distort the timechart's axis. * | timechart avg(cpu_seconds) by host | outlier action=TR get ps events, extract values from ps's tabular output, and calculate the average of CPU of each minute for each host. sourcetype=ps | multikv | timechart span=1m avg(CPU) by host search for events with the sourcetype "web", and produces a timechart count by host. Then fills all null values with "NULL". sourcetype=web | timechart count by host | fillnull value=NULL search all events and builds a contingency table for datafields. search all events, and calculate the co-currence correlation between all fields. * | contingency datafield1 datafield2 maxrows=5 maxcols=5 usetotal=F * | correlate type=cocur sums the numeric fields of each result, putting the sums in the field "sum". * | addtotals fieldname=sum returns only events with uncommon values. * | anomalousvalue action=filter pthresh=0.02 bucket search results into 10 bins, and counts the number of raw events and order them by size. * | bucket size bins=10 | stats count(_raw) by size return the average thruput for each host for each 5 minute time span. * | bucket _time span=5m | stats avg(thruput) by=_time host Classifying Events apply eventtypes to search results (automatically called from UI when viewing eventtype field) * | typer discover types of events that have errors error | typelearner Finding Transactions or Grouping Related Events/Results Together group into a transaction all events that have the same host and cookie, that occur within 30 seconds, and do not have a pause of more than 5 seconds between the events. * | transaction fields="host,cookie" maxspan=30s maxpause=5s group into a transaction all events that share the same 'from' value, within a maximum span of 30 seconds, have a pause between events no greater than 5 seconds. * | transaction fields=from maxspan=30s maxpause=5s group search results into 4 clusters, based on the values of the date_hour and date_minute fields. * | kmeans k=4 date_hour date_minute cluster events, sorted by cluster_count, and then return the first 20 events, which are the 20 largest clusters (in data size). * | cluster t=0.9 showcount=true | sort - cluster_count | head 20 Generating Results with Non-Search Commands reads results from csv file $SPLUNK_HOME/var/run/splunk/all.csv, keeping any that have errors, and then saving them out to errors.csv. | inputcsv all.csv | search error | outputcsv errors.csv /code> return the events in the file messages.1, as if it were indexed in Splunk. | file /var/log/messages.1 run the mysecurityquery saved search, and email any results to user@domain.com. | savedsearch mysecurityquery AND _count > 0 | sendemail to=user@domain.com User Interface Commands highlight the terms "login" and "logout". * | highlight login,logout show the best 5 lines of each search result. * | abstract maxlines=5 compare the ip of the first and 3rd search result. * | diff pos1=1 pos2=3 attribute=ip search for "xml_escaped", then unescape XML characters. source="xml_escaped" | xmlunescape Commands Related to Administration crawl root and home directory and add all inputs found to inputs.conf | crawl root="/;/Users/bob" | input add return processing properties - time zones, breaking characters, etc contained in props.conf. | admin props view audit trail information, stored in the local audit index, and decrypt signed audit events, checking for gaps and tampering. index=audit | audit Searches to do Subsearches return all URLs that have 404 errors but not 303 errors. | set diff [search 404 | select url] [search 303 | fields url] find 5 minute time regions around root logins, and then search each of those time ranges for "failure" login root | localize maxspan=5m maxpause=5m | map search= "search failure starttimeu::$starttime$ endtimeu::$endtime$" get all hourly time ranges from 5 days ago until today, and search each of those time ranges for "failure" | gentimes start=-5 increment=1h | map search="search failure starttimeu::$starttime$ endtimeu::$endtime$" create a search string from the host, source and sourcetype values. [* | fields + source, sourcetype, host | format ]