In Syslog We Trust? 5675d13 (2012-03-26) Assuria Ltd In Syslog We Trust? 1 Introduction Syslog is ubiquitous, but its popularity belies significant drawbacks. This paper discusses what syslog is and explains why the classic RFC 3164 variant is not appropriate for a logging infrastructure where any trust is required in the integrity of your log data. It also considers the newer RFC 5424–6 version, which is a substantial improvement but has significant problems that remain prohibitive where log data integrity and forensic soundness of log data are required. “Syslog” means several things: 1. A format for log messages. This is “syslog format”. 2. A protocol for transmitting syslog format messages between computers. This is the “syslog protocol”. 3. (Informally) a program that handles syslog format messages and sends/receives such messages via the syslog protocol. Popular syslog implementations include the classic Linux/Unix syslog daemons, syslog-ng, rsyslog and (on Windows) SolarWinds’ Kiwi syslog server. From a data integrity perspective we are concerned with three requirements: 1. No undetectable addition (or duplication) of events. 2. No undetectable modification of events. 3. No undetectable deletion (or loss, malicious or otherwise) of events. 2 Syslog Standards Syslog was first standardised as IETF RFC 3164 in 2001 [1]. This was a retrospective attempt to codify the various extant syslog implementations. More recently, the RFC 5424–6 standards [2, 3, 4] are an attempt to correct the most glaring problems with RFC 3164. Since RFC 5424 syslog is not backwardscompatible with RFC 3164 there is little sense in talking about “syslog”: one should specify the variant (3164 or 5424) being considered. This paper considers both variants. RFC 3164 is widely deployed: it is the de facto standard mechanism for network devices to output log messages. RFC 5424 has seen little traction to date, although as Linux distributions increasingly migrate to syslog-ng and rsyslog as their standard syslog implementations, its uptake is likely to increase. A third standard, RFC 3195 [5], encapsulates Syslog messages within XML documents and sends them over TCP. This paper does not consider RFC 3195 further as it appears to be moribund (which is unfortunate, because it has reliability benefits that RFC 3164 and RFC 5424–6 lack). 3 Syslog Format 3.1 RFC 3164 This is an RFC 3164 format syslog message: Jul 11 12:34:27 lemon su: 'su root' failed for alice on /dev/pts/3 This format has several problems: 1. The timestamp has no year: you have no idea when this failed privilege escalation attempt occurred. 2. The timestamp has no time zone. Even if it had a year, you’d still have no idea when it happened. Page 2 of 9 Copyright © Assuria Limited. All rights reserved Reading Enterprise Centre • The University of Reading Earley Gate • Reading • RG6 6BU • United Kingdom Assuria Ltd In Syslog We Trust? 3. Although syslog defines a “facility” (what subsystem generated the event) and a “severity” (how important the event is), these are not recorded in the message. 4. Although at first glance syslog messages are single-line records, we have seen newline characters in syslog messages1. This means that a file containing syslog messages can’t be parsed unambiguously. (You make an educated guess: if the start of a line looks like the start of a new message then assume that the previous message is complete.) 5. The standard restricts the total length of the message to 1024 bytes. The typical Windows 2008R2 Security log message is longer than this, and so cannot be converted to syslog format without losing data. 6. The standard only permits plain-text messages. Common log formats, including the Windows Event Log, support binary data in events and so cannot in general be converted to syslog format without losing data or using a binary-to-text conversion mechanism such as Base64 [6]. 7. The standard does not define the character set or encoding for the messages, beyond observing that “seven-bit ASCII” is “most often used”. Assuria have seen syslog messages encoded in ASCII, ISO 8859-1, UTF-8, EUC-JP and Shift_JIS. Given the way that syslog messages are forwarded, it is possible for a single syslog file to contain messages with different character sets/encodings, and with no reliable way to disambiguate (i.e. parse reliably) the resulting file. 8. The message is free-form text: there is no structure in the format that log parsing tools can exploit. This reduces such tools to pattern matching (e.g. regular expressions), which are vulnerable to seeminglyinsignificant changes to the message. For example, basic logon events are logged differently (and require different patterns to match) across currently-supported Enterprise Linux versions and distributions. (It is worth noting that since the syslog message is unstructured, converting a more structured event such as a Windows Event Log event or a Solaris BSM event into a syslog message necessarily discards this structure. In other words, it’s a lossy conversion.) 9. Although the standard is reasonably prescriptive about the initial part of the message, many applications and devices that generate syslog messages disregard the standard. The only consistent parts of the syslog message that you can assume will exist in general are the timestamp and hostname. 10. The message has no integrity protection: an attacker can modify it undetectably. Despite RFC 3164’s substantial shortcomings, and the fact that RFC 3164 itself states that “there are some concerns about the applicability of this protocol in situations that require robust delivery” (§6), many popular SIEM solutions are entirely 3164-orientated: they normalise all events into RFC 3164 format and forward events via the RFC 3164 protocol. This makes things easy for the SIEM vendor but prohibits non-textual data and events longer than 1024 bytes, and, by RFC 3164’s own admission, destroys integrity and forensic soundness, as defined in section 1 above and discussed in section 4.1 below. 1 Unlike RFC 5424, a strict reading of RFC 3164 prohibits newline characters in messages. As noted however, we have seen them in practice. This illustrates the general laxity and disregard for the standard by applications and devices that generate events in (notionally) RFC 3164 format. Page 3 of 9 Copyright © Assuria Limited. All rights reserved Reading Enterprise Centre • The University of Reading Earley Gate • Reading • RG6 6BU • United Kingdom Assuria Ltd 3.2 In Syslog We Trust? RFC 5424 RFC 5424 is Rainer Gerhards’ attempt to remedy RFC 3164’s shortcomings. The number and nature of those shortcomings mean that the 5424 format bears little relation to 3164, but is still colloquially referred to as “syslog”. It is essentially a different protocol, related to RFC 3164 in name only. The following example is taken from RFC 5424: 1 2003-10-11T22:14:15.003Z mymachine.example.com evntslog – ID47 [exampleSDID@32473 iut="3" eventSource="Application" eventID="1011"] BOMAn application event log entry... Key features: 1. The initial 1 is a version number. This allows the format to be developed in backwards-incompatible ways without processing software having to apply heuristics to determine how to parse a message, as is currently the case. 2. The timestamp provides a year and a time zone, and hence fixes the problems with the RFC 3164 timestamp. The only (slightly churlish) problem with the 5424 timestamp is that the standard is rather flexible, permitting various different timestamp formats. This makes a parser more difficult to write (and hence more prone to bugs) and permits incomplete timestamps. It would have been better to mandate a single, strict timestamp format, such as one based on ISO 8601 [7]. Still, at least RFC 5424 allows (and encourages) a decent timestamp, rather than prohibiting one as RFC 3164 does. 3. The format has explicit fields for hostname, application, process ID and message ID: there is more structure in the format, which makes parsing easier. 4. The section with square brackets is Structured Data: this allows <name, value> pairs to be expressed, rather than these being embedded within the plain-text message. Although the value is UTF-8, the name is restricted to seven-bit ASCII. The standard claims that structured data are “easily parseable”, but having implemented parsers for over thirty formats of all shapes and sizes, we beg to differ. The primary complication is that structured data are embedded in an unstructured textual message: control and data are mixed, which always leads to problems including, in a broader context (i.e. software in general), a significant class of security vulnerabilities. The 5424 format is not inherently insecure in this regard, but requires a non-trivial parser. The structured data encoding also disregards typing: values are always strings, even when they’re logically numbers (for example). 5. The message is unstructured text, as before. UTF-8 is recommended, but other encodings are permitted. RFC 5424 is a substantial improvement over RFC 3164. However, it still has some significant weaknesses: 1. The facility and severity are retained from RFC 3164, but they are again omitted from the message itself. 2. RFC 3164’s length limitation has been raised appreciably. However, receivers are explicitly permitted to truncate and/or discard messages: “If a transport receiver receives a message with a length larger than it supports, the transport receiver SHOULD truncate the payload. Alternatively, it MAY discard the message.” This alone is arguably Page 4 of 9 Copyright © Assuria Limited. All rights reserved Reading Enterprise Centre • The University of Reading Earley Gate • Reading • RG6 6BU • United Kingdom Assuria Ltd In Syslog We Trust? sufficient to deem the RFC 5424 format unsuitable for a forensicallysound logging infrastructure. At a minimum, one should confirm that the particular receiver[s] being used can cope with arbitrarily long messages (which is of debatable tractability) without such truncation or deletion. 3. Although RFC 5424 resolves many of RFC 3164’s shortcomings, it doesn’t mandate these improvements. You can use a decent timestamp, but you don’t have to. You can encode everything in UTF-8, but you can use EBCDIC if you fancy it. This makes handling RFC 5424 messages gratuitously complicated: you have to handle numerous timestamp formats, multiple character sets, messages with incomplete fields etc. 4. Perhaps as a consequence of the lax treatment of character sets, RFC 5424 “permits a syslog application to reformat control characters received” (§8.2). Not content with permitting truncation and deletion, the standard also permits modification of events. 5. In a regression from RFC 3164, the format merely discourages newlines in messages, rather than prohibiting them: “The syslog application SHOULD avoid octet values below 32 …. These values are legal” (§6.4). A file containing 5424-format messages is therefore ambiguous in the formal grammar sense, which means that there are multiple valid ways to parse the file. Only one of these ways will yield the original events, but there is no way to guarantee which. (You can make a guess that’s almost certainly correct, but you’re still guessing.) 6. The format is still intrinsically textual: you can’t easily represent binary data. 7. There is still no integrity protection, so the messages can again be modified undetectably. RFC 5424 is significantly better than RFC 3164 but also significantly more complicated. Complexity is an enemy of both correctness and security. 4 Syslog Protocol The Syslog Protocol is the means by which syslog messages are transferred between computers. As discussed, there is no such thing as “the syslog protocol” due to the mutually incompatible RFC 3164 and RFC 5424–6 standards. 4.1 RFC 3164 RFC 3164’s protocol is as simple as they come: prepend “<n>” to the syslog message, where n encodes the facility and severity, and send the result to UDP port 514. While this simplicity has probably contributed to its popularity, if you want any confidence in the integrity of your log data then it is disastrous: 1. UDP is an unreliable protocol. “Unreliable” in network terms means that delivery is not guaranteed: the network is within its rights silently to drop the message on the floor, or to duplicate the message. If you send one syslog message, expect any number (zero included) to arrive. 2. UDP does not guarantee that messages are received in the order that they were sent. Since the syslog timestamp granularity is one second, messages within a second cannot be ordered with respect to time. 3. UDP messages can be trivially spoofed, so there is no protection against fake messages being injected into the system. 4. UDP’s checksum is weak and optional: there is no protection of any significance against the message being corrupted in transit (accidentally, let alone maliciously). Page 5 of 9 Copyright © Assuria Limited. All rights reserved Reading Enterprise Centre • The University of Reading Earley Gate • Reading • RG6 6BU • United Kingdom Assuria Ltd In Syslog We Trust? 5. UDP is not encrypted: the message can be freely observed in transit. 6. UDP has no flow or congestion control: senders can bombard receivers with messages. The standard (indeed, only practical) defence against this is rate-limiting: if you receive too many messages in a certain time then discard further messages (perhaps from certain senders) for some further time period. 7. The sender has no means of knowing whether the receiver is online, let alone whether it’s receiving anything. It’s fire-and-forget (or fire-andhope). If the receiver is down then you’ll lose events until it’s reinstated. The common “solution” to this is highly-available collection infrastructure: this tends to be expensive since you’re replicating things, and not as highly-available as you’d like to think. You also gain a nontrivial deduplication problem. This means that if you’re using RFC 3164 as your transport mechanism you can guarantee none of the following: 1. A received syslog message was legitimately sent. 2. A legitimately sent syslog message was received (e.g. by your SIEM). 3. A received syslog message was the same message that was sent. 4. A received syslog message isn’t a duplicate of some other syslog message. If the reader is in any doubt about the shortcomings of RFC 3164 then they can consult the standard itself, which gives an honest, detailed and somewhat whimsical account of its problems2,3: • “There are several security consequences of the fundamental simplicity of syslog and there are some concerns about the applicability of this protocol in situations that require robust delivery” (§6). • “messages may be sent accidentally, erroneously or even maliciously” (§6). • “The receiver of the packet [i.e. message] will not be able to ascertain that the message was indeed sent from the reported sender” (§6.2). • “a misconfigured machine may send syslog messages to a collector representing itself as another machine” (§6.2.1). • “Malicious exploits of this [message forgery] behavior have also been noted” (§6.2.2) • “the forensics of a network anomaly rely upon reconstructing the sequence of events … the syslog process and protocol do not ensure ordered delivery.” (§6.3) • “messages may be recorded and replayed at a later time” (§6.3.4), i.e. RFC 3164 syslog is vulnerable to replay attacks. • “As there is no mechanism within either the syslog process or the protocol to ensure delivery, and since the underlying transport is UDP, some messages may be lost … The consequences of [this] cannot be determined” (§6.4). 2 The honesty of RFC 3164 is laudable; the customers of the IT industry would benefit from such honesty and objective analysis becoming more prevalent. 3 These are not criticisms of the designer of the syslog protocol: forensic integrity was not a requirement. The problem is with vendors and consultants who promote RFC 3164 as something that it is not. RFC 3164 is thus deployed inappropriately, with illusory results. Page 6 of 9 Copyright © Assuria Limited. All rights reserved Reading Enterprise Centre • The University of Reading Earley Gate • Reading • RG6 6BU • United Kingdom Assuria Ltd In Syslog We Trust? • “syslog messages may be damaged in transit, or an attacker may maliciously modify them” (§6.5). • “Neither the syslog protocol nor the syslog application have mechanisms to provide confidentiality of the messages in transit” (§6.6). A few years ago Assuria visited a customer site to investigate communication problems between Assuria Log Manager (ALM) Agents and the ALM Collector. It transpired that a misconfigured load-balanced switch was discarding one third of the packets that traversed it. ALM noticed because it had problems establishing reliable communications and complained via its own logging/alerting mechanisms (but didn’t lose any log data). Had the infrastructure been RFC 3164-based then the customer would have lost, irrevocably and undetectably (unless explicit endto-end tests were undertaken), a third of their events. If the purpose of your logging infrastructure is basic operational monitoring then RFC 3164 may be appropriate, but it is inadequate in almost every respect for any form of security monitoring or auditing. Since RFC 3164 provides a detailed and honest assessment of its shortcomings from a security and integrity perspective, the only logical conclusion that can be drawn about someone who recommends RFC 3164 in environments with such requirements is that they haven’t read RFC 3164 before recommending it. 4.2 RFC 5424–6 Unlike RFC 3164, which defines the format and transmission protocol, RFC 5424 just defines the format. Other standards define transmission protocols, or “transport mappings”. At time of writing, two such mappings are standardised: • RFC 5425: Syslog over TLS. • RFC 5426: Syslog over UDP. There are various implementations of syslog over TCP. This had an IETF draft [8] but this is now deemed Historic4, with RFC 5425 recommended instead. RFC 5426 is effectively RFC 3164’s UDP-based approach but carrying RFC 5424format messages. It has similarly prohibitive drawbacks and is not considered here further. RFC 5425 addresses several of the UDP-based protocols’ problems, but, like the 3164/5424 relationship, falls frustratingly short of providing the necessary levels of integrity for a forensically-sound audit solution (notwithstanding the problems inherent in the 5424 format itself): 1. Messages are sent over a reliable channel, where “reliable” is used in the formal communication protocol sense. Messages won’t be silently discarded or duplicated in transit. (“Silently” is the key word here: you can still lose messages in transit, as discussed below. Unlike UDP, you at least know that a problem has occurred though.) 2. The channel can be mutually-authenticated using standard, widelydeployed cryptographic algorithms and protocols. Spoofing is therefore prevented (within normally accepted limits, such as a private key compromise or a break of the cryptographic algorithms being used). 3. TLS provides a cryptographic checksum, so corruption in transit is statistically improbable. (It’s physically impossible to prevent such corruption, regardless of the communication channel [9], but it is reduced to an acceptable level.) 4 http://www.ietf.org/mail-archive/web/ietf-announce/current/msg09934.html : “The IESG does NOT RECOMMEND implementing or deploying syslog over plain TCP … Syslog over TLS transport [RFC5425] is recommended”. Page 7 of 9 Copyright © Assuria Limited. All rights reserved Reading Enterprise Centre • The University of Reading Earley Gate • Reading • RG6 6BU • United Kingdom Assuria Ltd In Syslog We Trust? 4. The communication is strongly encrypted, and hence can’t easily be eavesdropped upon. 5. The protocol has flow and congestion control, and authentication makes the trivial denial-of-service attack that UDP is vulnerable to impractical. (You can still launch a DoS attack on a 5425 receiver just as you can launch an attack on any other TCP-based server, but the protocol is not inherently vulnerable.) Like 5424, 5425 is substantially better than 3164. The main problem, however, is a lack of application-layer acknowledgements. The standard itself acknowledges this: “if the TCP connection (or TLS session) is broken for some reason (or closed by the transport receiver), the syslog transport sender cannot always know what messages were successfully delivered to the syslog application at the other end” (§6.3). Rainer Gerhards himself acknowledged this in 2008: “one can not implement a reliable syslog transport without introducing app-level acks. It is simply impossible” [10]. Since Gerhards, instrumental in the development of the 5424 family of standards, stated this before RFC 5425 was published, one can only conclude that RFC 5425 was not designed to be used in situations where reliably delivery (a pre-requisite of integrity and forensic soundness) is important. 5 When Worlds Collide The above sections have shown that from an integrity or forensic perspective, RFC 3164 syslog is hopeless while RFC 5424–6, though substantially better, still has prohibitive problems. However, the combination of RFC 3164 and RFC 5424 makes the situation worse. In technical terms (e.g. message formats) RFC 3164 and RFC 5424 are unrelated protocols. However, in marketing terms they’re both syslog, and they have sufficient overlap to cause problems. For example, you can send an RFC 3164 format message and an RFC 5424 format message to the same receiver, and these messages can be written to the same log file. This log file then contains messages in two mutually incompatible formats, arbitrarily interleaved with each other. The 5424 parsing problem, already non-trivial, promptly becomes even harder: we’re essentially parsing two unrelated formats, and guessing which parser to use on a per-event basis within the same file (after guessing which character set/encoding to use on a per-event basis within the same file). “Parsing via guessing” is not likely to be a strategy that wins many friends in an adversarial setting. Unless the entire infrastructure is transitioned to 5424 (which is impractical given currently-available network devices), the only way to avoid this is to rewrite all received messages into 5424 format in a single character set/encoding. Although these modifications might be acceptable from a forensic standpoint, you no longer have the original event and are therefore in a position of having to justify yourself, rather than having nothing to justify. 6 Conclusion This paper has discussed the various mutually-incompatible standards that are popularly referred to as “syslog”, and assessed them against the requirements for data integrity and forensic soundness. These properties are essential if the outputs of a log management or SIEM solution are to be considered credible when challenged. The original RFC 3164 has major shortcomings and explicitly states that it does not fulfil such requirements. RFC 5424 addresses many of these shortcomings but still has prohibitive weaknesses. RFC 5424 is an opportunity missed, largely through maintaining a pretence that it is “syslog” and by consequently being overly flexible format-wise and insufficiently strong protocolwise. This was perhaps essential in order to gain traction, however. Page 8 of 9 Copyright © Assuria Limited. All rights reserved Reading Enterprise Centre • The University of Reading Earley Gate • Reading • RG6 6BU • United Kingdom Assuria Ltd In Syslog We Trust? Assuria’s position is that RFC 3164 syslog will remain widely deployed for the foreseeable future due to its ubiquity in devices and appliances. Assuria Log Manager therefore provides a syslog receiver that can receive 3164 and 5424 format messages, furnishes both with a standard, normalised timestamp and applies robust cryptographic integrity protection (albeit after the unavoidable syslog network hop). We recommend avoiding syslog if possible; for example: • Most operating system audit functions (including the Windows Security Log, Solaris BSM, AIX Audit, HP-UX Trusted Mode Audit) have a rich, structured native format. Collect and retain such logs in that format 5. • CheckPoint can log via OPSEC. • Sourcefire appliances can log via eStreamer (push) or vJDBC (pull). • Cisco IPS devices provide CIDEE. • SQL Server 2008 Enterprise edition has a fine-grained audit function that yields structured database result sets. Earlier and non-Enterprise versions of SQL Server provide C2-compliant auditing in a structured binary format. • SharePoint provides a C# API to retrieve audit events as .NET objects. Where avoiding syslog is not possible, prefer RFC 5424. Where UDP-based syslog is unavoidable, keep hops as short as possible: ideally the sender and receiver should be on the same Ethernet segment, because once UDP hits routers it’s liable to be dropped. 7 References 1 C. Lonvick: The BSD syslog protocol. IETF Request for Comments 3164, 2001. 2 R. Gerhards: The Syslog Protocol. IETF Request for Comments 5424, 2009. 3 F. Miao et al.: Transport Layer Security (TLS) Transport Mapping for Syslog. IETF Request for Comments 5425, 2009. 4 A. Okmianski: Transmission of Syslog Messages over UDP. IETF Request for Comments 5426, 2009. 5 D. New and M. Rose: Reliable Delivery for syslog. IETF Request for Comments 3195, 2001. 6 S. Josefsson: The Base16, Base32, and Base64 Data Encodings. IETF Request for Comments 4648, 2006. 7 ISO: Data elements and interchange formats―Information interchange―Representation of dates and times. ISO 8601, Third edition, 2004. 8 R. Gerhards and C. Lonvick: Transmission of Syslog Messages over TCP. IETF draft-gerhards-syslog-plain-tcp-14.txt, 2012. 9 C. E. Shannon: A Mathematical Theory of Communication. Bell System Technical Journal 27, pp379–423, 623–656, 1948. 10 R. Gerhards: why [sic] you can’t build a reliable TCP protocol without app-level acks…. http://blog.gerhards.net/2008/05/why-you-cant-buildreliable-tcp.html, 2008. 5 The notion of taking a security audit log, typically accredited at least to EAL 4, lossily-converting it into a less-structured format and then forwarding over an unreliable transport seems somewhat at odds with the concept of security audit. Page 9 of 9 Copyright © Assuria Limited. All rights reserved Reading Enterprise Centre • The University of Reading Earley Gate • Reading • RG6 6BU • United Kingdom