
Chapter 5:
Statistical Flow Analysis
Tran Song Dat Phuc
SeoulTech 11.2015
Statistic Flow Analysis
Network flow record analysis is analogous to analysis of traffic
patterns on the road in real life, gathering info on computer
Every packet that traverses a network can be logged and
recorded. With networking, it records basic info about flow,
including source and destination IP address, port, protocol,
date, time, and the size of data transmitted in each packet.
Network flow records were generated, captured, and analyzed
to improve network performance, clear that the data is valuable
for other reasons as well.
Statistic Flow Analysis
Forensic analysts investigating a crime may find flow records
can reveal a detailed portrait of network activity.
Statistical analysis of flow data serves various purposes:
Identifying Compromised Host: sends out more traffic than
normal, transmit or receive traffic on unusual ports, communicate
with malicious systems …
Confirming or Disproving Data Leakage: determine whether
attacker exported sensitive info across the network perimeter,
analyze the volume of exported data and determine whether a leak
may have occurred.
Individual Profiling: provide info of an individual’s activity
(working hours, lunch, break times, sources of entertainment,
inappropriate activity …), determine which users communicate
and exchange… helpful in cases involve HR issues.
Statistic Flow Analysis
Statistical flow analysis is important in modern world due to
the sheer volume of traffic that organizations produce, manage
and inspect.
Collecting or recording full contents of all packets can be done,
but need a large amount of storage space, high impact on
performance, not necessary.
Statistical flow analysis helps investigators identify specific
targets for content analysis and further investigation.
Every bit of data that is not collected is lost forever. Too much
useless data can make analysis difficult or impossible.
5.1 Process Overview
Flow Record Processing System
Flow record: A subset of info about a flow, includes source and
destination IP address, port, protocol, date, time, and the
amount of data transmitted in each flow.
A “sensor” is a device used to monitor the flows of traffic on
segment and extract bits of info in a flow record.
Flow record data is exported from a sensor to a “collector”, a
server listen on the network and store it to a hard drive.
An organization may have multiple collectors to capture flow
record data from different network segments, or to provide
Once the flow record data has been exported and stored, it can
be aggregated and analyzed using a wide variety of tools.
5.2 Sensors
The architecture of the local network environment and the
types of equipment available for use as sensors influence
whether flow record data can be produced, and effect the
resulting output format of the flow export data and subsequent
analysis techniques.
Sensor Types
Sensor can be deployed as standalone appliances, or software
processes running on network equipment serves other
Many common types of network equipment are capable of
producing flow record data.
Setting up a standalone appliance may be preferable, because
specific types of network equipment support only limited
output formats for flow record exportation
Network Equipment
Some switches (Cisco Catalyst) support flow record generation
and exportation.
Current Cisco routers and firewalls support to produce flow
data and export it to a collector, using Cisco NetFlow format.
Sonicwall supports flow record exportation protocol (IPFIX),
in addition to NetFlow.
Some of vendors base the exported flow record data on packet
sampling, not appropriate for network forensic because it will
not provide complete info.
Standalone Appliances
The network administrators or forensic investigators may
choose to deploy a standalone appliance as a sensor for
generating and exporting flow record data.
A software-based sensor can be deploy anyplace on the
network. Organizations can set up port monitoring or a network
tap to send traffic to the standalone sensor, turn processes
traffic, generates statistics, and send flow record data to the
Deploying standalone sensor is preferable because standalone
server no need to function as a router or perform other
functions, and no need to modify existing network
Sensor Software
Most modern, enterprise-quality routers and switches can be
configured to act as sensors and export flow record data across
the network to collectors (Cisco, Juniper, SonicWALL, …).
Or set up standalone sensors using free, open-source tools,
such as Argus, softflowd, yaf, …
Argus stands for “Audit Record Generation and Utilization System”, is a
mature, libpcapbased network flow sensing, collection, and analysis toolkit.
It is open-source software, copyrighted by QoSient, LLC and released under
the GNU Public License, has been tested on Mac OS X, Linux, FreeBSD,
OpenBSD, NetBSD, Solaris.
Argus is distributed as two packages: the Argus server, which reads
packets from a network interface or packet capture file; and the Argus
client tools, which are used to collect, distribute, process, and analyze
Argus data.
The Argus server functions as a sensor and can output flow export
data in Argus’ compressed format, as files or over the network via
As a libpcap-based tool, Argus supports BPF filtering, which allows
the user to finetune filtering and maximize performance. It also
supports multiple output streams, which can be directed to different
collector processes or hosts.
softflowd is an open-source flow monitoring tool developed by
Damien Miller.
It is designed to passively monitor traffic and export flow record data
in NetFlow format, can be installed on Linux and OpenBSD
operating systems, and is based on libpcap.
The hardware requirements are minimal; you need a network card
capable of capturing your target volume of traffic and a second
network card for exporting flows from the sensor to a collector.
Select disk space based on the volume of flow data you would like to
yaf stands for “Yet Another Flowmeter”, is open-source flow sensor
Released in 2006 under the GNU GPL, yaf is based on libpcap and
can read packets from live interfaces or packet captures.
It exports flow data in the IPFIX format, over SCTP, TCP, or UDP
transport-layer protocols.
One of the most powerful features of yaf is that it supports BPF
filters for the purposes of filtering incoming traffic. This can allow
investigators and network engineers to dramatically reduce the
required processing capabilities and volume of flow export data.
yaf natively supports encrypted flow export using TLS.
Sensor Placement
Factors to consider when placing sensors includes:
 Duplication: minimize duplication of flows collected. Place
sensors in location where traffic to pass multiple sensors, and can
obtain targeted data.
 Time synchronization: very crucial. Is the time not accurate, it’s
difficult to correlate flows exported with other devices, or gather
evidence from other resources.
 Perimeter versus internal traffic: visibility inside the network is
valuable. Local flow data help identify compromised workstations
seeking to propagate to new targets, both internal and external.
 Resources: each investigator has limited resources, decide which
is the most important, review local network map, choose choke
points to maximize collection capacity, fit to budget, time …
 Capacity: network devices have limited capacity to monitor and
process traffic, employ more filtering, or partition VLANs to
multiple ports, sensors, interfaces …
Modifying the Environment
To gather additional flow record data by modifying the
environment, few general options available:
 Leverage existing equipment: through minor configuration
changes to equipment, existing network equipment is capable of
exporting flow record data, includes switches, routers, firewalls,
and NIDS/NIPS equipment. Examine network device capacity and
format of flow record data fit to collection system.
 Upgrade network equipment: deploy replacement switches or
other network devices, while existing network equipment not
support flow record exportation or cannot handle the increased
capacity, can be do straightforward or require reconfiguration.
 Deploy additional sensors: devices not support flow record
exportation, but support port mirroring, choose to send packets to
standalone sensor, or deploy network tap and send data to
standalone sensor for flow record collection.
5.3 Flow Record Export Protocols
Flow Record Export Protocols
NetFlow: first developed in 1996 by Cisco System, Inc. for
caching and exporting traffic flow info, in the purposes of
improving performance.
Many devices can produce NetFlow data, from routers to
switches to standalone probes.
When a flow is completed or no longer active, the sensor marks
as “expired” and exports the flow record data in a “NetFlow
Export” packet transmit over the network to (a) collector(s).
The latest NetFlow v9 supports for flexible template-based
specification of flow record fields, allow users to customize
data cached and exported by a sensor, greater flexibility for
analysis and performance improvements.
Flow Record Export Protocols
IPFIX: is a successor to Cisco’s NetFlow, based on NetFlow
Handle bidirectional flow reporting, reduce data redundancy
when reporting on flows with similar attributes, provide better
interoperability as an open standard.
The flow record data is extensible through data templates.
The collector send templates defining the data to be exported,
then sensor uses this template to construct the flow data export
packets, and send back to the collector as the flows expire.
Flow Record Export Protocols
sFlow: developed by InMon Corporation, is a standard for
providing network visibility based on packet sampling.
Support by many network device manafacturers (not Cisco).
Conduct statistical packet sampling, not support recording and
processing info about every packet.
Scale well to large networks, with high throughput.
The packets are not sampled
not recorded and cannot be
Flow Export: Transport-Layer Protocols
Traditional flow export data was transmitted over the network
via UDP, a connectionless, unreliable transport-layer protocol
(drop and never recovered).
NetFlow v9 and IPFIX use reliable transport-layer protocols
such as Stream Control Transmission Protocol (SCTP) for flow
export data transmission.
SCTP was designed from inception to allow for volume
transfer, including multiple streams over a single session.
Provide reliability for multiple voluminous streams.
5.4 Collection & Aggregation
Collection Placement & Architecture
Factors to consider when placing collectors includes:
 Congestion: choose locations where transit times between sensor
and collector are unlikely to be impacted by either routine or
incident-based congestion (caused by worm).
 Security: to protect the confidentiality, place collectors on
segments where the paths between collector and sensor are accesscontrolled and well protected. Necessary to encrypt the flow
export data with protocols such as IPSec or TLS.
 Reliability: use reliable transport-layer protocols, such as TCP or
 Capacity: how many collectors to set up. Each collector keeps up
bandwidth of sensors, network card capacity, processing power,
RAM, storage space…
 Strategy for Analysis: consider the architecture of analysis
Collection Systems
SiLK: System for Internet Level Knowledge, is a suite of commandline tools for the collection and storage of flow data.
Provide powerful toolset for the filtering and analysis of flow
Produced by the Network Situational Awareness (NetSA) group at
CERT, SiLK can process NetFlow v5 and v9, IPFIX data, produce
statistical data in various output formats configurable to a fine level
of granularity.
Includes two collector-specific tools: flowcap- listens for flow
records on a network socket, temporary stores flow data to disk or
RAM, forwards compressed stream to a client program; and
rwflowpack- collect flow record data and export to the compressed
SiLK Flow format.
Collection Systems
flow-tools: collect NetFlow traffic exported by sensors via UDP,
only accepts input on UDP ports, not support TCP or SCTP.
nfdump/ NfSen: based on open source BSD license, read flow
export data from UDP network socket or pcap files, allows for
extensive customization of data files stored on disk.
Argus: client tools suite includes a collector. All the client
programs accept input Argus and NetFlow v1-8 formats from
multiple networks or filesystem sources, and send Argus output
up to 128 client programs via network or filesystem.
5.5 Analysis
Statistics: - The science which has to do with the collection,
classification, and analysis of facts of a numerical nature
regarding any topic.
Flow Record Analysis Techniques
The precise analysis techniques vary on a case-by-case basic,
depending on investigative goals and resources.
Goals and resources: assess time, staff, equipment, and tools.
Strategize and identify the analysis techniques best fit with
investigative goals and available resources.
Starting Indicators: includes IP address of system, time
frame, known ports, specific flow records.
Analysis Technique
Filtering: narrow down a large pool of evidence to a subset or
groups of subsets, involves flow record data, select only small
percentages of detailed analysis and presentation.
Baselining: network administrators build a profile of normal
network activity, forensic investigators can compare flow
record traffic to the baseline profile to identify anomalies.
“Dirty Values”: suspicious keywords when searching for
evidence relevant, include IP addresses, ports, or protocols.
Activity Pattern Matching: every activity leaves fingerprints
on network. Patterns can indicate suspicious activities (high
volume), some match behaviors of viruses or worms.
Filtering is fundamental to flow record analysis.
As a forensic investigator, your job is to remove extraneous
details and identify events, hosts, ports, and activities of
Alternatively, you might filter for flows that match particular
patterns of activity indicative of worm behavior, data
exfiltration, port scanning, or other suspicious behavior,
depending on your investigative goals.
Most analysis techniques used for forensic analysis of flow
records are based on filtering.
Network traffic analysts can build, maintain, and reference a baseline
of traffic and identify trends and patterns of activity that are
considered “normal” for the environment.
By aggregating a large volume of flow information, it’s possible to
build a large, detailed set of benchmarks of what should be
considered normal over any span of time and across business cycles.
• Network baselines: By looking at general trends over time for
monitored segments, the traffic seen can be understood even if
specific source and destination IP addresses must be abstracted or
• Host baselines: Likewise, when a particular host becomes of
interest, investigators can build or refer to a historical baseline of
a specific host’s activities in order to identify or investigate
anomalous behavior.
Dirty Values
Network forensic analysts can compile a list of “dirty values”
and search flow record data to pick out relevant entries.
When conducting statistical flow analysis, “dirty values” aren’t
usually words, they are more likely to be suspicious IP
addresses, ports, dates, and times.
As you conduct your analysis, you will often find that it is
helpful to maintain an updated list of suspicious values that you
collect as you move forward.
Activity Pattern Matching
Network flow record data, when aggregated, represents complex
behavior that is often predictable and can be analyzed
We have previously compared network flows to traffic on physical
roads. Physical-world phenomena network flows contain patterns
that can be empirically measured, mathematically described, and
logically analyzed.
The biggest challenge for forensic investigators and anyone who
analyzes flow record data is the absence of large publicly available
data sets for research and comparison.
However, within an organization, it is entirely possible to empirically
analyze day-to-day traffic and build statistical models of normal
IP address: the source and destination IP addresses are great clues
that reveal a lot about the cause and purpose of a flow. Consider
whether the addresses are on an internal network or Internetexposed, countries of origin, what companies they are registered to,
and other factors.
Ports: much of the time they do correspond with assigned or wellknown ports linked to specific applications or services. Port numbers
can also indicate whether a system is port scanning or being scanned
and help you identify malicious activity.
Protocol and Flags: layer 3 and 4 protocols are often tracked in flow
record data. These can indicate whether connections were completed
and help you tell the difference between connection attempts that
were denied by firewalls, successful port scans, successful data
transfers, and more. They can also help you make educated guesses
as to the content and purpose of a flow.
Directionality: the directionality of the flows are crucial, it can
indicate whether proprietary data has been leaked or a malicious
program was downloaded. Taken in aggregate, the directionality of
data transfers can allow you to tell the difference between websurfing activity and web-serving activity.
Volume of data transferred: can help indicate the type of activity
and whether or not higher-layer data transfer attempts were
successful (many small TCP packets may be indicative of port
scanning, whereas larger packets can indicate file exportation). The
distribution of the data transferred over time matters; a large volume
of data transferred in a very short period of time is usually caused by
something different than the same amount of data transferred over a
very long period of time
Simple Patterns
“Many-to-one” IP addresses: if many IP addresses send large
volumes of traffic to one destination IP address, maybe:
 DOS attack
 Syslog server
 “Drop box” data repository
 Email server
“One-to-many” IP addresses: if one IP address send large volumes
of traffic to many destination IP addresses, maybe:
 Web server
 Email server
 Spam bot
 Network port scanning
Simple Patterns
“Many-to-many” IP addresses: Many IP addresses sending
distributed traffic to many destinations can be indicative of:
 Peer-to-peer filesharing traffic
 Widespread virus infection.
“One-to-one” IP addresses: if one IP address send large volumes of
traffic to one destination IP address, maybe:
 Targeted attack
 Routine server communications.
Complex Patterns
Fingerprint is the process of matching complex flow record
patterns to specific activities.
When fingerprinting traffic, examine multiple elements and
context, and develop a hypothesis of the cause of the behavior.
TCP SYN port scan might have characteristics:
 One source IP address
 One or more destination IP addresses
 Destination port numbers increasing incrementally
 Volume of packets surpassing a specific value within a period of
 TCP protocol
 Outbound protocol flags set to “SYN”
Flow Record Analysis Tools
SiLK: The SiLK suite includes powerful flow export data analysis
 rwfilter: is designed to extract flows of interest from a particular
data repository, filter them by time and category, and then
“partition” them further by protocol attributes. It supports a rich
syntax that is generally as functional as the BPF, though different.
 rwstats, rwcount, rwcut, rwuniq, et al.: SiLK includes an arsenal
of basic flow export data manipulation utilities. rwstats produces
statistical aggregations based on the protocol fields specified.
rwcount counts packets and bytes. rwcut selects the fields that
rwuniq can help you sort on.
 rwidsquery: can be fed either a Snort rule file, or an alert file, and
figures out what flows from its input would match the rule or alert,
and writes an rwfilter invocation to produce the flows that match.
Flow Record Analysis Tools
rwpmatch: This is essentially a libpcap-based program that reads
in SiLK-formatted flow metadata and an input packet source and
saves out just the packets that match the flow metadata.
Advanced SiLK: In addition to the chainable command-line suite,
a Python interpreter, “PySiLK,” is available that implements the
SiLK functionality by exposing it through a Python API.35. The
nice folks at NetSA have provided a “Tooltips” wiki so the user
community can share experience and grow better faster.
Flow Record Analysis Tools
flow-tools: The flow-tools suite includes a variety of flow export
data collection, storage, processing, and sending tools, including a
few tools that are particularly useful for forensic analysis.
 The “flow-report” tool creates ASCII-readable text reports based
on stored flow data.
 Report contents can be customized by the user through the
configuration file, and then sent as input to graphing or analysis
 The “flow-nfilter” program allows users to filter flow export data
based on “primitives,” which are specific to flow-tools.
 The “flow-dscan” is a particularly useful utility, designed to
identify suspicious traffic based on flow export data. It includes
features for identifying port scanning, host scanning, and denialof-service attacks.
Flow Record Analysis Tools
Argus Client Tools: The Argus suite includes a variety of
specialized utlities with powerful analysis capabilties.
ra: Argus’ basic tool for reading, filtering, and printing Argus data,
allows the user to specify fields for printing, select specific records
for processing, match regular expressions, and more.
racluster: cluster flow export data based on user-specified criteria.
This is very helpful for printing summaries of flow record data.
rasort: sort flow data based on user-specified criteria, such as
source or destination IP address, time, TTL, flow duration, and
ragrep: powerful regular expression and pattern matching, based
on the GNU “grep” utility.
Flow Record Analysis Tools
rahisto: generate frequency distribution tables for user-selected
metrics such as flow record duration, source and destination port
number, bytes transferred, packet count, bits per second, and more.
ragraph: create visual plots based on user-specified fields of
interest, such as bytes, packet counts, average duration, IP address,
ports, and more. ragraph includes a variety of tools that allow
users to customize the graph appearance.
Flow Record Analysis Tools
FlowTraq: FlowTraq is a commercial flow record analysis tool
developed by ProQueSys.
It supports a very wide variety of input formats, including NetFlow
v9, IPFIX, JFlow, and many others.
It can also sniff traffic directly and generate flow records. Once
collected, FlowTraq allows users to filter, search, sort, and produce
reports based on flow records.
In addition, you can specify patterns to generate alerts.
FlowTraq supports a variety of operating systems (Windows,
Linux, Mac, and Solaris), and is designed and marketed for
forensics and incident response (among other purposes).
Flow Record Analysis Tools
nfdump/NfSen: The “nfdump” utility (part of the nfdump suite) is
designed to read flow record data, analyze it, and produce
customized output. It offers users powerful analysis features,
 Aggregate flow record fields by specific fields.
 Limit by time range.
 Generate statistics about IP addresses, interfaces, ports, and much
 Anonymize IP addresses.
 Customize output format.
 BPF-style filters.
 “NfSen” (“Netflow Sensor”) provides a graphical, web-based
interface for the nfdump suite. It is an open-source tool written in
Perl and PHP, designed to run on Linux and POSIX-based
operating systems.
Flow Record Analysis Tools
EtherApe: EtherApe is an open-source, libpcap-based graphical tool
that visually displays network activity in real time. It reads packet
data directly from a network interface or pcap file.
EtherApe does not take flow records as input. We are including it
here because it provides a nice high-level visualization of traffic
patterns, and therefore may be of interest to the reader.
5.6 Conclusion
Statistic flow record analysis is becoming important for
forensic analysis.
Flow records were generated for the purposes of monitoring
and improving network performance, they are also excellent
resources of network-based forensic evidence.
A variety of sensor, collector, aggregation, and analysis tools,
ranging from proprietary to free and open-source tools.
One of the biggest challenges forensic investigators face is
ensuring that the formats used by the sensors and collectors are
compatible with the analysis tools chosen for the investigation.
Statistic flow record analysis is a powerful field of study that
will grow over the next decades.
“Network Forensics: Tracking Hackers through Cyberspace”
Sherri Davidoff, Jonathan Ham; ISBN-10: 0132564718,
ISBN-13: 9780132564717©2012, Prentice Hall Cloth, 576
pages, Published 06/13/2012.