In Syslog We Trust?
5675d13 (2012-03-26)
Assuria Ltd
In Syslog We Trust?
1 Introduction
Syslog is ubiquitous, but its popularity belies significant drawbacks. This paper
discusses what syslog is and explains why the classic RFC 3164 variant is not
appropriate for a logging infrastructure where any trust is required in the integrity
of your log data. It also considers the newer RFC 5424–6 version, which is a
substantial improvement but has significant problems that remain prohibitive
where log data integrity and forensic soundness of log data are required.
“Syslog” means several things:
1.
A format for log messages. This is “syslog format”.
2.
A protocol for transmitting syslog format messages between computers.
This is the “syslog protocol”.
3.
(Informally) a program that handles syslog format messages and
sends/receives such messages via the syslog protocol. Popular syslog
implementations include the classic Linux/Unix syslog daemons,
syslog-ng, rsyslog and (on Windows) SolarWinds’ Kiwi syslog server.
From a data integrity perspective we are concerned with three requirements:
1.
No undetectable addition (or duplication) of events.
2.
No undetectable modification of events.
3.
No undetectable deletion (or loss, malicious or otherwise) of events.
2 Syslog Standards
Syslog was first standardised as IETF RFC 3164 in 2001 [1]. This was a
retrospective attempt to codify the various extant syslog implementations. More
recently, the RFC 5424–6 standards [2, 3, 4] are an attempt to correct the most
glaring problems with RFC 3164. Since RFC 5424 syslog is not backwardscompatible with RFC 3164 there is little sense in talking about “syslog”: one
should specify the variant (3164 or 5424) being considered. This paper considers
both variants.
RFC 3164 is widely deployed: it is the de facto standard mechanism for network
devices to output log messages. RFC 5424 has seen little traction to date,
although as Linux distributions increasingly migrate to syslog-ng and rsyslog as
their standard syslog implementations, its uptake is likely to increase.
A third standard, RFC 3195 [5], encapsulates Syslog messages within XML
documents and sends them over TCP. This paper does not consider RFC 3195
further as it appears to be moribund (which is unfortunate, because it has
reliability benefits that RFC 3164 and RFC 5424–6 lack).
3 Syslog Format
3.1
RFC 3164
This is an RFC 3164 format syslog message:
Jul 11 12:34:27 lemon su: 'su root' failed for alice on /dev/pts/3
This format has several problems:
1.
The timestamp has no year: you have no idea when this failed privilege
escalation attempt occurred.
2.
The timestamp has no time zone. Even if it had a year, you’d still have
no idea when it happened.
Page 2 of 9
Copyright © Assuria Limited. All rights reserved
Reading Enterprise Centre • The University of Reading
Earley Gate • Reading • RG6 6BU • United Kingdom
Assuria Ltd
In Syslog We Trust?
3.
Although syslog defines a “facility” (what subsystem generated the
event) and a “severity” (how important the event is), these are not
recorded in the message.
4.
Although at first glance syslog messages are single-line records, we
have seen newline characters in syslog messages1. This means that a
file containing syslog messages can’t be parsed unambiguously. (You
make an educated guess: if the start of a line looks like the start of a
new message then assume that the previous message is complete.)
5.
The standard restricts the total length of the message to 1024 bytes.
The typical Windows 2008R2 Security log message is longer than this,
and so cannot be converted to syslog format without losing data.
6.
The standard only permits plain-text messages. Common log formats,
including the Windows Event Log, support binary data in events and so
cannot in general be converted to syslog format without losing data or
using a binary-to-text conversion mechanism such as Base64 [6].
7.
The standard does not define the character set or encoding for the
messages, beyond observing that “seven-bit ASCII” is “most often
used”. Assuria have seen syslog messages encoded in ASCII,
ISO 8859-1, UTF-8, EUC-JP and Shift_JIS. Given the way that syslog
messages are forwarded, it is possible for a single syslog file to contain
messages with different character sets/encodings, and with no reliable
way to disambiguate (i.e. parse reliably) the resulting file.
8.
The message is free-form text: there is no structure in the format that
log parsing tools can exploit. This reduces such tools to pattern
matching (e.g. regular expressions), which are vulnerable to seeminglyinsignificant changes to the message. For example, basic logon events
are logged differently (and require different patterns to match) across
currently-supported Enterprise Linux versions and distributions.
(It is worth noting that since the syslog message is unstructured,
converting a more structured event such as a Windows Event Log event
or a Solaris BSM event into a syslog message necessarily discards this
structure. In other words, it’s a lossy conversion.)
9.
Although the standard is reasonably prescriptive about the initial part of
the message, many applications and devices that generate syslog
messages disregard the standard. The only consistent parts of the
syslog message that you can assume will exist in general are the
timestamp and hostname.
10. The message has no integrity protection: an attacker can modify it
undetectably.
Despite RFC 3164’s substantial shortcomings, and the fact that RFC 3164 itself
states that “there are some concerns about the applicability of this protocol in
situations that require robust delivery” (§6), many popular SIEM solutions are
entirely 3164-orientated: they normalise all events into RFC 3164 format and
forward events via the RFC 3164 protocol. This makes things easy for the SIEM
vendor but prohibits non-textual data and events longer than 1024 bytes, and, by
RFC 3164’s own admission, destroys integrity and forensic soundness, as defined
in section 1 above and discussed in section 4.1 below.
1
Unlike RFC 5424, a strict reading of RFC 3164 prohibits newline characters in
messages. As noted however, we have seen them in practice. This illustrates the
general laxity and disregard for the standard by applications and devices that
generate events in (notionally) RFC 3164 format.
Page 3 of 9
Copyright © Assuria Limited. All rights reserved
Reading Enterprise Centre • The University of Reading
Earley Gate • Reading • RG6 6BU • United Kingdom
Assuria Ltd
3.2
In Syslog We Trust?
RFC 5424
RFC 5424 is Rainer Gerhards’ attempt to remedy RFC 3164’s shortcomings. The
number and nature of those shortcomings mean that the 5424 format bears little
relation to 3164, but is still colloquially referred to as “syslog”. It is essentially a
different protocol, related to RFC 3164 in name only. The following example is
taken from RFC 5424:
1 2003-10-11T22:14:15.003Z mymachine.example.com evntslog – ID47
[exampleSDID@32473 iut="3" eventSource="Application"
eventID="1011"] BOMAn application event log entry...
Key features:
1.
The initial 1 is a version number. This allows the format to be developed
in backwards-incompatible ways without processing software having to
apply heuristics to determine how to parse a message, as is currently
the case.
2.
The timestamp provides a year and a time zone, and hence fixes the
problems with the RFC 3164 timestamp. The only (slightly churlish)
problem with the 5424 timestamp is that the standard is rather flexible,
permitting various different timestamp formats. This makes a parser
more difficult to write (and hence more prone to bugs) and permits
incomplete timestamps. It would have been better to mandate a single,
strict timestamp format, such as one based on ISO 8601 [7]. Still, at
least RFC 5424 allows (and encourages) a decent timestamp, rather
than prohibiting one as RFC 3164 does.
3.
The format has explicit fields for hostname, application, process ID and
message ID: there is more structure in the format, which makes parsing
easier.
4.
The section with square brackets is Structured Data: this allows
<name, value> pairs to be expressed, rather than these being
embedded within the plain-text message. Although the value is UTF-8,
the name is restricted to seven-bit ASCII.
The standard claims that structured data are “easily parseable”, but
having implemented parsers for over thirty formats of all shapes and
sizes, we beg to differ. The primary complication is that structured data
are embedded in an unstructured textual message: control and data are
mixed, which always leads to problems including, in a broader context
(i.e. software in general), a significant class of security vulnerabilities.
The 5424 format is not inherently insecure in this regard, but requires a
non-trivial parser.
The structured data encoding also disregards typing: values are always
strings, even when they’re logically numbers (for example).
5.
The message is unstructured text, as before. UTF-8 is recommended,
but other encodings are permitted.
RFC 5424 is a substantial improvement over RFC 3164. However, it still has some
significant weaknesses:
1.
The facility and severity are retained from RFC 3164, but they are again
omitted from the message itself.
2.
RFC 3164’s length limitation has been raised appreciably. However,
receivers are explicitly permitted to truncate and/or discard messages:
“If a transport receiver receives a message with a length larger than it
supports, the transport receiver SHOULD truncate the payload.
Alternatively, it MAY discard the message.” This alone is arguably
Page 4 of 9
Copyright © Assuria Limited. All rights reserved
Reading Enterprise Centre • The University of Reading
Earley Gate • Reading • RG6 6BU • United Kingdom
Assuria Ltd
In Syslog We Trust?
sufficient to deem the RFC 5424 format unsuitable for a forensicallysound logging infrastructure. At a minimum, one should confirm that the
particular receiver[s] being used can cope with arbitrarily long messages
(which is of debatable tractability) without such truncation or deletion.
3.
Although RFC 5424 resolves many of RFC 3164’s shortcomings, it
doesn’t mandate these improvements. You can use a decent timestamp,
but you don’t have to. You can encode everything in UTF-8, but you can
use EBCDIC if you fancy it. This makes handling RFC 5424 messages
gratuitously complicated: you have to handle numerous timestamp
formats, multiple character sets, messages with incomplete fields etc.
4.
Perhaps as a consequence of the lax treatment of character sets,
RFC 5424 “permits a syslog application to reformat control characters
received” (§8.2). Not content with permitting truncation and deletion,
the standard also permits modification of events.
5.
In a regression from RFC 3164, the format merely discourages newlines
in messages, rather than prohibiting them: “The syslog application
SHOULD avoid octet values below 32 …. These values are legal” (§6.4).
A file containing 5424-format messages is therefore ambiguous in the
formal grammar sense, which means that there are multiple valid ways
to parse the file. Only one of these ways will yield the original events,
but there is no way to guarantee which. (You can make a guess that’s
almost certainly correct, but you’re still guessing.)
6.
The format is still intrinsically textual: you can’t easily represent binary
data.
7.
There is still no integrity protection, so the messages can again be
modified undetectably.
RFC 5424 is significantly better than RFC 3164 but also significantly more
complicated. Complexity is an enemy of both correctness and security.
4 Syslog Protocol
The Syslog Protocol is the means by which syslog messages are transferred
between computers. As discussed, there is no such thing as “the syslog protocol”
due to the mutually incompatible RFC 3164 and RFC 5424–6 standards.
4.1
RFC 3164
RFC 3164’s protocol is as simple as they come: prepend “<n>” to the syslog
message, where n encodes the facility and severity, and send the result to UDP
port 514. While this simplicity has probably contributed to its popularity, if you
want any confidence in the integrity of your log data then it is disastrous:
1.
UDP is an unreliable protocol. “Unreliable” in network terms means that
delivery is not guaranteed: the network is within its rights silently to
drop the message on the floor, or to duplicate the message. If you send
one syslog message, expect any number (zero included) to arrive.
2.
UDP does not guarantee that messages are received in the order that
they were sent. Since the syslog timestamp granularity is one second,
messages within a second cannot be ordered with respect to time.
3.
UDP messages can be trivially spoofed, so there is no protection against
fake messages being injected into the system.
4.
UDP’s checksum is weak and optional: there is no protection of any
significance against the message being corrupted in transit (accidentally,
let alone maliciously).
Page 5 of 9
Copyright © Assuria Limited. All rights reserved
Reading Enterprise Centre • The University of Reading
Earley Gate • Reading • RG6 6BU • United Kingdom
Assuria Ltd
In Syslog We Trust?
5.
UDP is not encrypted: the message can be freely observed in transit.
6.
UDP has no flow or congestion control: senders can bombard receivers
with messages. The standard (indeed, only practical) defence against
this is rate-limiting: if you receive too many messages in a certain time
then discard further messages (perhaps from certain senders) for some
further time period.
7.
The sender has no means of knowing whether the receiver is online, let
alone whether it’s receiving anything. It’s fire-and-forget (or fire-andhope). If the receiver is down then you’ll lose events until it’s reinstated.
The common “solution” to this is highly-available collection
infrastructure: this tends to be expensive since you’re replicating things,
and not as highly-available as you’d like to think. You also gain a nontrivial deduplication problem.
This means that if you’re using RFC 3164 as your transport mechanism you can
guarantee none of the following:
1.
A received syslog message was legitimately sent.
2.
A legitimately sent syslog message was received (e.g. by your SIEM).
3.
A received syslog message was the same message that was sent.
4.
A received syslog message isn’t a duplicate of some other syslog
message.
If the reader is in any doubt about the shortcomings of RFC 3164 then they can
consult the standard itself, which gives an honest, detailed and somewhat
whimsical account of its problems2,3:
•
“There are several security consequences of the fundamental simplicity
of syslog and there are some concerns about the applicability of this
protocol in situations that require robust delivery” (§6).
•
“messages may be sent accidentally, erroneously or even maliciously”
(§6).
•
“The receiver of the packet [i.e. message] will not be able to ascertain
that the message was indeed sent from the reported sender” (§6.2).
•
“a misconfigured machine may send syslog messages to a collector
representing itself as another machine” (§6.2.1).
•
“Malicious exploits of this [message forgery] behavior have also been
noted” (§6.2.2)
•
“the forensics of a network anomaly rely upon reconstructing the
sequence of events … the syslog process and protocol do not ensure
ordered delivery.” (§6.3)
•
“messages may be recorded and replayed at a later time” (§6.3.4), i.e.
RFC 3164 syslog is vulnerable to replay attacks.
•
“As there is no mechanism within either the syslog process or the
protocol to ensure delivery, and since the underlying transport is UDP,
some messages may be lost … The consequences of [this] cannot be
determined” (§6.4).
2
The honesty of RFC 3164 is laudable; the customers of the IT industry would
benefit from such honesty and objective analysis becoming more prevalent.
3
These are not criticisms of the designer of the syslog protocol: forensic integrity
was not a requirement. The problem is with vendors and consultants who
promote RFC 3164 as something that it is not. RFC 3164 is thus deployed
inappropriately, with illusory results.
Page 6 of 9
Copyright © Assuria Limited. All rights reserved
Reading Enterprise Centre • The University of Reading
Earley Gate • Reading • RG6 6BU • United Kingdom
Assuria Ltd
In Syslog We Trust?
•
“syslog messages may be damaged in transit, or an attacker may
maliciously modify them” (§6.5).
•
“Neither the syslog protocol nor the syslog application have mechanisms
to provide confidentiality of the messages in transit” (§6.6).
A few years ago Assuria visited a customer site to investigate communication
problems between Assuria Log Manager (ALM) Agents and the ALM Collector. It
transpired that a misconfigured load-balanced switch was discarding one third of
the packets that traversed it. ALM noticed because it had problems establishing
reliable communications and complained via its own logging/alerting mechanisms
(but didn’t lose any log data). Had the infrastructure been RFC 3164-based then
the customer would have lost, irrevocably and undetectably (unless explicit endto-end tests were undertaken), a third of their events.
If the purpose of your logging infrastructure is basic operational monitoring then
RFC 3164 may be appropriate, but it is inadequate in almost every respect for
any form of security monitoring or auditing. Since RFC 3164 provides a detailed
and honest assessment of its shortcomings from a security and integrity
perspective, the only logical conclusion that can be drawn about someone who
recommends RFC 3164 in environments with such requirements is that they
haven’t read RFC 3164 before recommending it.
4.2
RFC 5424–6
Unlike RFC 3164, which defines the format and transmission protocol, RFC 5424
just defines the format. Other standards define transmission protocols, or
“transport mappings”. At time of writing, two such mappings are standardised:
•
RFC 5425: Syslog over TLS.
•
RFC 5426: Syslog over UDP.
There are various implementations of syslog over TCP. This had an IETF draft [8]
but this is now deemed Historic4, with RFC 5425 recommended instead.
RFC 5426 is effectively RFC 3164’s UDP-based approach but carrying RFC 5424format messages. It has similarly prohibitive drawbacks and is not considered
here further.
RFC 5425 addresses several of the UDP-based protocols’ problems, but, like the
3164/5424 relationship, falls frustratingly short of providing the necessary levels
of integrity for a forensically-sound audit solution (notwithstanding the problems
inherent in the 5424 format itself):
1.
Messages are sent over a reliable channel, where “reliable” is used in
the formal communication protocol sense. Messages won’t be silently
discarded or duplicated in transit. (“Silently” is the key word here: you
can still lose messages in transit, as discussed below. Unlike UDP, you at
least know that a problem has occurred though.)
2.
The channel can be mutually-authenticated using standard, widelydeployed cryptographic algorithms and protocols. Spoofing is therefore
prevented (within normally accepted limits, such as a private key
compromise or a break of the cryptographic algorithms being used).
3.
TLS provides a cryptographic checksum, so corruption in transit is
statistically improbable. (It’s physically impossible to prevent such
corruption, regardless of the communication channel [9], but it is
reduced to an acceptable level.)
4
http://www.ietf.org/mail-archive/web/ietf-announce/current/msg09934.html :
“The IESG does NOT RECOMMEND implementing or deploying syslog over plain
TCP … Syslog over TLS transport [RFC5425] is recommended”.
Page 7 of 9
Copyright © Assuria Limited. All rights reserved
Reading Enterprise Centre • The University of Reading
Earley Gate • Reading • RG6 6BU • United Kingdom
Assuria Ltd
In Syslog We Trust?
4.
The communication is strongly encrypted, and hence can’t easily be
eavesdropped upon.
5.
The protocol has flow and congestion control, and authentication makes
the trivial denial-of-service attack that UDP is vulnerable to impractical.
(You can still launch a DoS attack on a 5425 receiver just as you can
launch an attack on any other TCP-based server, but the protocol is not
inherently vulnerable.)
Like 5424, 5425 is substantially better than 3164. The main problem, however, is
a lack of application-layer acknowledgements. The standard itself acknowledges
this: “if the TCP connection (or TLS session) is broken for some reason (or closed
by the transport receiver), the syslog transport sender cannot always know what
messages were successfully delivered to the syslog application at the other
end” (§6.3). Rainer Gerhards himself acknowledged this in 2008: “one can not
implement a reliable syslog transport without introducing app-level acks. It is
simply impossible” [10]. Since Gerhards, instrumental in the development of the
5424 family of standards, stated this before RFC 5425 was published, one can
only conclude that RFC 5425 was not designed to be used in situations where
reliably delivery (a pre-requisite of integrity and forensic soundness) is important.
5 When Worlds Collide
The above sections have shown that from an integrity or forensic perspective,
RFC 3164 syslog is hopeless while RFC 5424–6, though substantially better, still
has prohibitive problems. However, the combination of RFC 3164 and RFC 5424
makes the situation worse. In technical terms (e.g. message formats) RFC 3164
and RFC 5424 are unrelated protocols. However, in marketing terms they’re both
syslog, and they have sufficient overlap to cause problems.
For example, you can send an RFC 3164 format message and an RFC 5424 format
message to the same receiver, and these messages can be written to the same
log file. This log file then contains messages in two mutually incompatible
formats, arbitrarily interleaved with each other. The 5424 parsing problem,
already non-trivial, promptly becomes even harder: we’re essentially parsing two
unrelated formats, and guessing which parser to use on a per-event basis within
the same file (after guessing which character set/encoding to use on a per-event
basis within the same file). “Parsing via guessing” is not likely to be a strategy
that wins many friends in an adversarial setting.
Unless the entire infrastructure is transitioned to 5424 (which is impractical given
currently-available network devices), the only way to avoid this is to rewrite all
received messages into 5424 format in a single character set/encoding. Although
these modifications might be acceptable from a forensic standpoint, you no longer
have the original event and are therefore in a position of having to justify
yourself, rather than having nothing to justify.
6 Conclusion
This paper has discussed the various mutually-incompatible standards that are
popularly referred to as “syslog”, and assessed them against the requirements for
data integrity and forensic soundness. These properties are essential if the
outputs of a log management or SIEM solution are to be considered credible when
challenged. The original RFC 3164 has major shortcomings and explicitly states
that it does not fulfil such requirements. RFC 5424 addresses many of these
shortcomings but still has prohibitive weaknesses. RFC 5424 is an opportunity
missed, largely through maintaining a pretence that it is “syslog” and by
consequently being overly flexible format-wise and insufficiently strong protocolwise. This was perhaps essential in order to gain traction, however.
Page 8 of 9
Copyright © Assuria Limited. All rights reserved
Reading Enterprise Centre • The University of Reading
Earley Gate • Reading • RG6 6BU • United Kingdom
Assuria Ltd
In Syslog We Trust?
Assuria’s position is that RFC 3164 syslog will remain widely deployed for the
foreseeable future due to its ubiquity in devices and appliances. Assuria Log
Manager therefore provides a syslog receiver that can receive 3164 and 5424
format messages, furnishes both with a standard, normalised timestamp and
applies robust cryptographic integrity protection (albeit after the unavoidable
syslog network hop). We recommend avoiding syslog if possible; for example:
•
Most operating system audit functions (including the Windows Security
Log, Solaris BSM, AIX Audit, HP-UX Trusted Mode Audit) have a rich,
structured native format. Collect and retain such logs in that format 5.
•
CheckPoint can log via OPSEC.
•
Sourcefire appliances can log via eStreamer (push) or vJDBC (pull).
•
Cisco IPS devices provide CIDEE.
•
SQL Server 2008 Enterprise edition has a fine-grained audit function
that yields structured database result sets. Earlier and non-Enterprise
versions of SQL Server provide C2-compliant auditing in a structured
binary format.
•
SharePoint provides a C# API to retrieve audit events as .NET objects.
Where avoiding syslog is not possible, prefer RFC 5424. Where UDP-based syslog
is unavoidable, keep hops as short as possible: ideally the sender and receiver
should be on the same Ethernet segment, because once UDP hits routers it’s
liable to be dropped.
7 References
1
C. Lonvick: The BSD syslog protocol. IETF Request for Comments 3164,
2001.
2
R. Gerhards: The Syslog Protocol. IETF Request for Comments 5424,
2009.
3
F. Miao et al.: Transport Layer Security (TLS) Transport Mapping for
Syslog. IETF Request for Comments 5425, 2009.
4
A. Okmianski: Transmission of Syslog Messages over UDP. IETF Request
for Comments 5426, 2009.
5
D. New and M. Rose: Reliable Delivery for syslog. IETF Request for
Comments 3195, 2001.
6
S. Josefsson: The Base16, Base32, and Base64 Data Encodings. IETF
Request for Comments 4648, 2006.
7
ISO: Data elements and interchange formats―Information
interchange―Representation of dates and times. ISO 8601, Third
edition, 2004.
8
R. Gerhards and C. Lonvick: Transmission of Syslog Messages over TCP.
IETF draft-gerhards-syslog-plain-tcp-14.txt, 2012.
9
C. E. Shannon: A Mathematical Theory of Communication. Bell System
Technical Journal 27, pp379–423, 623–656, 1948.
10
R. Gerhards: why [sic] you can’t build a reliable TCP protocol without
app-level acks…. http://blog.gerhards.net/2008/05/why-you-cant-buildreliable-tcp.html, 2008.
5
The notion of taking a security audit log, typically accredited at least to EAL 4,
lossily-converting it into a less-structured format and then forwarding over an
unreliable transport seems somewhat at odds with the concept of security audit.
Page 9 of 9
Copyright © Assuria Limited. All rights reserved
Reading Enterprise Centre • The University of Reading
Earley Gate • Reading • RG6 6BU • United Kingdom