2015/09/14
Michael Hare
UW System Network https://stats.uwsys.net/
• Cisco 4500X, Cisco 1921:
• SNMP/VTY ACLs
• Juniper MX:
• In addition to SNMP/VTY ACLs, we also ACL NTP, Radius, DNS and routing
[BFD, BGP, IGMP LDP, MSDP, OSPF, PIM, RSVP, VRRP] as tightly as possible.
[Happy to share specifics offline, just ask]
• The routing engine has to also handle things like ARP, LACP, NDP, PVSTP so packets that make it past all of the above are subject to compliance policers that sit between the forwarding engine and routing engine punt path. This
Juniper feature is called “ddos-protection” and ensures the routing engine doesn’t fall over during abuse [DoS] or accident [bridge loop]
• Does NOT work on the forwarding path, only packets destined to be handled by the routing engine.
• Classifies punt path packets into various categories. [ARP, BGP, etc]
• Packet rate policer, burst, logging and detection [flow, IFL] tweakable per category.
• Operational data is collected over XML every 5 minutes. We use this data to set parameters, detect policed packet events, etc.
• syslog of policed packet event
• r-uwsuperior-hub-2: %DAEMON-4-DDOS_SCFD_FLOW_FOUND: A new flow of protocol ARP:aggregate on irb.157 with source addr -- -- -- is found at 2015-
09-14 09:58:55 CDT
• Graph [all stats available in GNMIS]
• https://stats.uwsys.net/cgi-bin/shorten.fcgi?i=102&c=9a401e0b465505c4
• Settings:
• output of “r-uwsuperior-hub-2-re0> show ddos-protection protocols arp” is too big for the powerpoint.
• SNMP and ICMP monitoring of availability and usage via FIDO
• XML data collection via screen scraping [separate from FIDO]
• IPFIX flow export [1:256 sampling rate] anycasted with samplicator to nfcapd
• Juniper firewall filter counters [XML collection]
• FIDO is running active/active from Madison and Milwaukee
• FIDO stores time series data on a variety of datapoints
• ICMPv4/ICMPv6, ifTable [discards, errors, octets, packets], CPU
• XML scraping is running active/passive due to Juniper control plane limits, however, the resultant data is stored in both Madison and
Milwaukee. We use XML where SNMP fails us [or is too burdensome]
• CoS queue monitoring, firewall filter stats, DOM, BGP route stats, temperature monitoring
• IPv4/IPv6 only [no family MPLS]
• Collecting ingress on all IP interfaces on MX2010s
• Collecting ingress on all IP customer handoffs on MX104s; not all IP traffic between UWs must pass through an MX2010.
• We can collect on egress on IP customer handoffs on MX104s. We are replaying collected flows [delayed 5 minutes] to at least one UW campus.
• NFDUMP/NFSEN/home grown software in use for flow analysis and storing data into RRDs
• Lots of categories tracked in time series:
• On-net vs on-net [across the entire system], v4 vs v6, etc:
• Commodity vs Research vs Peering:
• https://stats.uwsys.net/cgi-bin/shorten.fcgi?i=103&c=8cd1d65a85db5d8f
• Subnet level, Protocol and Port info
• Subnet: https://stats.uwsys.net/cgi-bin/shorten.fcgi?i=104&c=0fd94f6c9e85890a
• Protocol: https://stats.uwsys.net/cgi-bin/shorten.fcgi?i=105&c=8cc1cb05ef6b01b7
• Port: https://stats.uwsys.net/cgi-bin/shorten.fcgi?i=107&c=954532f579e74027
• AS Statistics [but CDNs limit the usefulness]
• External subnets: https://stats.uwsys.net/cgibin/shorten.fcgi?i=108&c=79084aed22e51be8
• Domain / host info: utility hampered by method chosing [PTR lookups]. Would need all client DNS lookups to be all seeing: -> https://flows.uwsys.net/cgibin/DNSQuery.fcgi
• Per-interface statistics enabled
• Anything that can be matched in a Juniper ACL term can be counted
• Anything that can be counted can be policed, but only by bps, not pps. Bps limit appears to be a user interface limitation, not hardware limitation
• Time series data is fed into a thresholding engine to do fast but crude anomalous traffic detection
term count-flag-syn { from { tcp-flags syn;
}
} then { count :count:tcp:flag-syn; next term;
} term dns-packetsize-1400-to-9999 { from { packet-length 1400-9999; port 53;
}
} then { count :count:dns-packetsize-1400-to-9999; next term;
}
group='fw-inet-protect-re‘
:accept:igmp:accepted
:accept:udp:ldp-discover
:accept:igmp-ldp-igmp
:accept:tcp:ldp-unicast
:accept:ospf:accepted
:accept:pim:accepted
:accept:rsvp:accepted
:accept:tldp-discover
:accept:vrrp
:accept:dns
:accept:mpls-traceroute:accepted
:accept:udp:ntp
:accept:radius
:accept:bfd-multihop
:accept:bfd
:accept:tcp:BGP
:discard:udp:ntp
:accept:icmp:accepted
:accept:tcp:MSDP
:accept:udp:traceroute group='fw-inet-remote-access'
:accept:tcp-ftp
:accept:udp:SNMP
:accept:tcp:SSH
:discard:remote-access
:accept:tcp:established
group='fw-bridge-count-traffic'
:count:arp
:count:broadcast
:count:ipv4
:count:ipv6
:count:multicast-v4
:count:multicast-v6 group='fw-inet-block-application-traffic'
:count:udp:ntp-other
:accept:udp:ntp
:accept:udp:ssdp-dns
:accept:udp:ssdp group='fw-inet-block-bogons'
:discard:bogons group='fw-inet-count-cos-traffic-input'
:count:cos-assured-forwarding
:count:cos-best-effort group='fw-inet-count-traffic'
:count:esp:traffic
:count:fragment
:count:icmp:traffic
:count:tcp:flag-ack
:count:tcp:flag-fin
:count:tcp:flag-psh
:count:tcp:flag-rst
:count:tcp:flag-syn
:count:tcp:flag-urg
:count:tcp:traffic
:count:udp:rpc
:count:udp:traffic
:count:udp:zero
:count:dns-packetsize-0-to-64
:count:dns-packetsize-65-to-576
:count:dns-packetsize-1400-to-9999
:count:packetsize-0-to-64
:count:packetsize-65-to-128
:count:packetsize-129-to-512
:count:packetsize-513-to-1500
:count:packetsize-1501-to-9999
• Reporting and thresholding of RRD datapoints integrated into FIDO and other generated reports
• https://stats.uwsys.net/cgibin/rrd_reports.cgi
• GNMIS: silly name, web 1.0 looks, but lots of data
• https://stats.uwsys.net/cgibin/gnmis.fcgi
• Alarm, syslog and report analysis provides feedback to changing/improving Juniper ddosprotection, firewall filters, etc
• Significant events shared with uwsysnet@maillist.uwsa.edu
[should there be a security specific list?]
• Volumetric UDP port 53 attacks: three observed this semester so far
[UW Eau Claire, UW Madison, UWC Rock Co]
• UDP Packet sizes > 1400 bytes are generally well under 1Mbps
• TCP more likely for EDNS and DNSSEC although fragmented UDP RFC compliant
• US-CERT TA15-240A: Controlling Outbound DNS Access recommends enterprise DNS only allowed via border
• We could configure different port 53 policers based on destination address
[ie, unfettered or very loose access for enterprise DNS but stricter policers for random internal host port 53 traffic, etc]
• I have a powerpoint on MPLS, but don’t have time to present. This presentation and others are available at: https://stats.uwsys.net/other/
FIN
• Current CoS queue setup
• network-control: strict 5% of line rate, then competes with best-effort
• assured-forwarding: [ip based admission h323 endpoints]
• 1G links: strict 20% of line rate, then competes with best-effort
• 10G links: strict 10% of line rate, then competes with best-effort
• 100G links: strict 5% of line rate, then competes with best-effort
• expedited-forwarding: 20% of line rate, then competes with best-effort
[currently, all mpls VPNs]
• best-effort: what's left
• Current CoS classification setup
• Core interfaces retain marking across all families [ip, ethernet, mpls]
• Untrusted [customer] IP traffic is classified into best-effort unless they are admitted by prefix-list into assuredforwarding [h323]
• Untrusted [customer] ethernet traffic [bridging] is classified into best-effort
• l2circuit MPLS traffic is transmutted as follows:
• ip TOS equivalent 0,1 [includes dscp] = best-effort
• ip TOS equivalent 2-7 [includes dscp] = expedited forwarding
• E-VPN will likely be treated like l2circuit from a CoS perspective
• Q: Is this what we want?
• Do we want DC backup to be demoted to best effort? Or less than best effort?
• It's OK to have different CoS policies per VPN
• Using BGP more specific advertisements to ride out a DoS attack
• uwsys.net MX104s as licensed can handle several hundred ‘more specific’ routes per UW [down to host route]
• More specific routes stay local to uwsys.net AS and are used for ingress traffic steering