ISS CRITICAL INCIDENT REPORT

advertisement
ISS
CRITICAL INCIDENT REPORT
INCIDENT DETAILS
 Brief description:
No access to staff email, external sites and WHD on Thursday 12th Sept 15:00 to 15:30;
Friday 13th Sept 19:00 - 23:00

Start date and time: Thursday 12th Sept 15:00, Friday 13th Sept 19:00, Saturday 20th
Sept 4:30

End date and time: Thursday 12th Sept 15:30, Friday 13th Sept 23:00, Saturday 20th Sept
21:00

Date of this report:18 Oct 2013

Incident Number: CIR-1309-0001
INCIDENT SUMMARY
At the specified times connections to the internet from all staff desktops stopped. In parallel
with this, user access to exchange2010 was dropped and the service desk application was
unresponsive.
INCIDENT ANALYSIS

How and when did ISS find out about the incident?
The issue was evident when email access was lost for everyone using exchange2010.

Which service(s) were affected?
Exchange2010
Service desk application
Staff internet access

What locations were affected?
Campus wide

What was the user impact?
No internet access for all staff an no email access for staff using exchange2010

How was service restored?
The incident on Thursday 12th resolved itself without any user interaction.
The incident on Friday 13th required a network configuration change to reroute all internet
bound traffic away from the proxy servers and load balancers on a temporary basis. Traffic
was rerouted back to load balancers and a repeat of the incident on Saturday 20th provided
time to find the root of the issue and affect a resolution.

If service restoration was a work round how & when will a permanent solution be
effected?
N\A
 What is needed to prevent a recurrence?
ACL’s have being implemented to prevent a recurrence of this particular incident.

Are we currently at risk of recurrence or other service disruption following the incident?
In what way? What needs to be done to mitigate the risk?
No increased risk of a recurrence of this type of incident over and above the normal risk that
an incident of a similar nature can occur. The best mitigation is a robust end point patch
management approach.
ROOT CAUSE
The root cause of this incident was a Trojan infection on a video conferencing device
installed on the network. The video conference unit was attempting a denial of service attack
against a European web site by generating large numbers of small fragmented packets with a
spoofed source address.
These malformed packets travelled correctly through the network but caused a failure on
load balancers which are in the path of the client to internet connection. The load balancers
dropped all traffic when hit by the large volume of fragmented traffic.
The issue was resolved by removing the VC unit from the network and additionally
upgrading the software version on the Load balancers.
FOLLOW UP ACTIONS AND RECOMMENDATIONS
See Lessons learned below
LESSONS LEARNED
Incident_WSA_lesso
ns_learned.docx
Download