Domain 9: Incident Response

advertisement
CSA Guidance Version 3
Domain 9: Incident Response
Incident Response (IR) is one of the cornerstones of information security management: even
the most diligent planning, implementation, and execution of preventive security controls
cannot completely eliminate the possibility of an attack on the Confidentiality, Integrity, or
Availability of information assets. One of the central questions for organizations moving into
the cloud must therefore be: what must be done to enable efficient and effective handling of
security incidents that involve resources in the cloud?
Cloud computing does not necessitate a new conceptual framework for Incident Response;
rather it requires that the organization appropriately map its extant IR programs, processes
and tools to the specific operating environment it embraces. This is consistent with the
guidance found throughout this document; a gap analysis of the controls that encompass
your organization’s incident response function should be carried out in a similar fashion.
This domain seeks to identify those gaps pertinent to Incident Response that are created by
the unique characteristics of cloud computing. Security professionals may use this as a
reference when developing response plans and conducting other activities during the
preparation phase of the IR lifecycle.
Overview.
This domain is organized in accord with the commonly accepted Incident
Response Lifecycle as described in NIST 800-61[1]. After establishing the characteristics of
cloud computing that impact Incident Response most directly, each subsequent section
addresses a phase of the lifecycle and explores the potential considerations for responders.
1. Introduction
1.1 Cloud Computing Characteristics that Impact Incident Response
Although cloud computing brings change on many levels, certain facets of the cloud
ecosystem bear more direct challenges to Incident Response activities than others[2].
First, the on demand self-service nature of cloud computing environments means that a
cloud customer may find it hard or even impossible to receive the required co-operation from
their cloud service provider (CSP) when handling a security incident. Depending on the
service and deployment models used, interaction with the Incident Response function at the
CSP will vary. Indeed, the extent to which security incident detection, analysis, containment,
Copyright © 2011 Cloud Security Alliance
CSA Guidance Version 3
and recovery capabilities have been engineered into the service offering are key questions for
provider and customer to address.
Second, the resource pooling practiced by cloud services as well as the rapid elasticity offered
by cloud infrastructures may drastically complicate incident handling. Precise identification of
assets affected during an incident and the collection of the telemetry and artifacts associated
with the attack (logging, netflow data, memory, machine images, storage etc) without
compromising the privacy of co-tenants is a technical challenge that must be addressed
primarily by the provider, but it is up to the cloud customer to satisfy himself that his cloud
service provider has done so and can provide him with the incident-handling support he
requires.
Third, despite not being described as an essential cloud characteristic, cloud computing
frequently leads to data crossing geographic and jurisdictional boundaries, in the worst case
without explicit knowledge of this fact by the cloud customer. The ensuing legal and
regulatory implications affect the incident handling process by placing limitations on what
may be done and/or prescribing what must be done during incident response, in all phases of
the lifecycle. As always, it is advisable that an organization include representatives from its
legal department on the Incident Response team to provide guidance on these issues.
Cloud computing also presents opportunities for information security professionals to deliver
an enhanced response. Virtualization technologies and the elasticity inherent in cloud
computing platforms can allow for more efficient and effective containment and recovery,
with less service interruption than would be expected with more traditional data center
technologies. Also, investigation of incidents may be easier in some respects, as virtual
machines can easily be moved into lab environments where runtime analysis can be
conducted and forensic images taken and examined.
1.2 The Cloud Architecture Security Model as a Reference
To a great extent, deployment and service models dictate the division of labor when it comes
to Incident Response in the cloud ecosystem. Using the architectural framework and security
controls review advocated in Domain 1 (see Cloud Reference Model figure 1.5.2a) can be
valuable in identifying what technical and process components are owned by which
organization, at which level of the “stack.”
Cloud service models (IaaS, PaaS, SaaS) differ appreciably in the amount of visibility and
control a customer has to the underlying IT systems and other infrastructure that deliver the
computing environment. This has implications for all phases of Incident Response as it does
with all other domains in this guidance document.
For instance, in a SaaS solution, response activities will mostly reside with the CSP, whereas in
IaaS, a greater degree of responsibility and capability for detecting and responding to security
Copyright © 2011 Cloud Security Alliance
CSA Guidance Version 3
incidents will reside with the customer. However, even in IaaS there are significant
dependencies on the CSP. Data from physical hosts, network devices, shared services,
security devices like firewalls and any management backplane systems must be delivered by
the Provider. To be certain, some providers are already provisioning the capability to deliver
this telemetry to their customers, and managed security service providers are advertising
cloud based solutions to receive and process this data.
Given the complexities, the Security Control Model described in Domain 1 (figure 1.5.1c) and
the activities an organization performs to map security controls to your particular cloud
deployment should inform IR planning and vice versa. Traditionally, controls for Incident
Response have concerned themselves more narrowly with higher-level organizational
requirements; however, security professionals must take a more holistic view in order to be
truly effective. Those responsible for IR should be fully integrated into the auditing and
planning of any security control which would directly, or even indirectly, affect response. At
a minimum, this process can help in mapping of roles/responsibilities during each phase of
the IR lifecycle.
Cloud deployment models (public, private, hybrid, community) are also considerations when
reviewing IR capabilities in a cloud deployment; the ease of gaining access to IR data varies
for each deployment model. It should be self-evident that the same continuum of
control/responsibility exists here as well. In this domain, we primarily concern ourselves with
the more public end of the continuum. We assume that the more private your cloud, the
more control you will have to develop the appropriate security controls, or have them
delivered by your provider to your satisfaction.
2. Incident Response Lifecycle Examined
2.1 Preparation
Preparation may be the most important phase in the Incident Response Lifecycle when
information assets are deployed to the cloud. Identifying the challenges (and opportunities)
for Incident Response should be a formal project undertaken by information security
professionals within your organization prior to migration to the cloud. This exercise should
be undertaken during every refresh of the enterprise Incident Response Plan.
In each lifecycle phase discussed below, the questions raised and suggestions provided can
serve to inform the organization’s planning process. Integrating the concepts discussed into
a formally documented plan should serve to drive the right activities to remediate any gaps
and take advantage of any opportunities.
Preparation begins with a clear understanding and full accounting of where the organization's
data resides in motion and at rest. Given that the organization's information assets now
traverse organizational, and likely, geographic boundaries necessitates threat modeling on
both the physical and logical planes. Data Flow diagrams which map to physical assets, and
Copyright © 2011 Cloud Security Alliance
CSA Guidance Version 3
map organizational, network and jurisdictional boundaries should serve to highlight any
dependencies that might arise during a response.
Since multiple organizations are now involved, Service Level Agreements (SLA) and contracts
between the parties now become the primary means of communicating and enforcing
expectations for responsibilities in each phase of the IR lifecycle. It is advisable to share
Incident Response Plans with the other parties and to precisely define any terminology.
Where possible, any ambiguities should be cleared up in advance of an incident.
It is unreasonable to expect CSPs to create separate IR plans for each customer. However,
the existence of some (or all) of the following points in a contract/SLA should give your
organization some confidence that your provider has done some advanced planning for
Incident Response:








Points of Contact, communication channels, and availability of IR teams for each party
Notification criteria, both SOC to SOC and to any external parties
Incident declaration criteria and event data available
Explication of roles/responsibilities during a security incident, including
The incident data/artifacts that are to be shared during an incident, in what format,
and the means of dissemination
Identification of any "sanitization" operations that may be undertaken on data
provided
Description of any IR testing done by the parties to the contract and whether results
will be shared
Post-mortem activities, including any expectations on final Incident Reports and root
cause analyses
Once the roles and responsibilities have been determined, your organization can now
properly resource, train, and equip your Incident Responders to handle the tasks that they
will have direct responsibility for. For example, if your application resides in a PaaS model
and your cloud provider has agreed to provide (or allow retrieval of) platform-specific
logging, having the technologies and personnel on staff to receive, process and analyze those
types of logs is an obvious need. For IaaS and PaaS, aptitude with virtualization and the
means to conduct forensics and other investigation on virtual machines will be integral to any
response effort. A decision about whether the particular expertise required is organic to
your organization or is outsourced to a Third Party is something to be determined during the
preparation phase. Note that outsourcing then prompts another set of contracts/NDAs/SLAs
to manage.
It should be noted here that the customer organization will be responsible for any application
layer code that is developed on-premise and deployed to the cloud. Scenarios where the
root cause is determined to be a code flaw in the application puts the onus squarely on the
organization to enhance secure coding practices, and to emphasize response activities which
allow the efficient and effective remediation of bugs. It is easy to envision scenarios where an
Copyright © 2011 Cloud Security Alliance
CSA Guidance Version 3
enterprise application is offline after having been properly contained - waiting for the bug fix
then becomes the roadblock to full recovery. This may represent a shift in emphasis for inhouse IR teams; your organization is presumably outsourcing some (or all) responsibility for
network level incident detection and response to the provider.
The most important part of preparing for an incident is testing the plan. Tests should be
thorough and mobilize all the parties who are likely going to be involved during a true
incident. It is unlikely that your CSP has resources to participate in tests with each of its
customers; consider role-playing as a means to identify what tasking, or requests for
information are likely to be directed at the CSP. Use this information to inform future
discussions with the provider while in the preparation phase. Another possibility is for the
customer to volunteer to participate in any testing the CSP may have planned.
2.2 Detection and Analysis
Timely detection of security incidents and successful subsequent analysis of the incident
(what has happened, how did it happen, which resources are affected, etc.) depend on the
availability of the relevant data and the ability to correctly interpret that data. In both cases,
cloud computing provides challenges: (1) availability of data largely depends on what the
cloud provider supplies to the customer; (2) analysis is complicated by the fact that the
analysis at least partly concerns provider-owned infrastructure, of which the customer
usually has little knowledge. Elasticity and resource pooling complicate both the collection of
relevant data and the interpretation of such data. The following subsections provide guidance
regarding data sources for detection, analysis and the interpretation of data.
Given the distribution of responsibilities discussed in paragraph 1.1.2 above, Incident
Response becomes heavily dependent on the logistics of collecting and correlating the right
telemetry (logs, forensic artifacts, etc.) so that a coherent picture can be developed in the
detection and analysis phases. Understanding where data resides at every step in the
transaction is required to mobilize the right parties to develop strategies to contain and
eradicate an attack.
Imperative to enumerate the attack asap. It dictates all of the further strategies and helps to
identify who should be passed command of the incident. Ultimate responsbilty
2.2.1 Data Sources
As in any hosted IT service integration, the IR team will need to determine the appropriate
logging required to adequately detect anomalous events and identify malicious activity that
would affect their cloud applications. It is imperative for the customer organization to
conduct an assessment of what logs (and other data) are available, how they are collected
and processed and finally how and when they may be delivered by the CSP.
Copyright © 2011 Cloud Security Alliance
CSA Guidance Version 3
Some incidents can only be detected by the cloud provider, e.g., because they concern the
infrastructure hosted by the cloud provider -- the SLAs must be such that the cloud provider
indeed informs in a timely and reliably manner about these incidents. For other incidents -even though detectable by the customer -- the cloud provider may be in a better position for
detection. Cloud customers should prefer cloud providers that optimally assist in the
detection of incidents.
For the detection and subsequent analysis of incidents on the customer side, logging
information is required. When collecting required log information, ensure that the following
are taken into consideration:







Clock Synchronization. All cloud component clocks should be synchronized. Ensure
that the cloud provider's log data sufficiently distinguishes time zones for accurate
forensic interpretation.
Elasticity Characteristics. As new cloud resources (VMs, etc.) are brought online to
service demand the log information produced by the new resource instance will need
to be added to the stream of log data when appropriate.
Virtualization Components. As appropriate, ensure that you can retrieve required
hypervisor log data.
Audit Logs. Acquisition of audit logs from all required components (e.g. network,
system, application, and cloud administration roles and accesses, backup and restore
activities, maintenance access, change management activity)
Performance Logs. These logs may help provide indications of notification
Legal Requirements. All organizations involved in the response must be able to
ensure that any data collected
Data Formats. Normalization of event and other data is a considerable challenge. The
use of open formats (such as the emerging CEE [4]) may ease processing at the
customer side.
The amount of data produced from the cloud deployment may be considerable. It may be
necessary to investigate cloud provider options regarding log filtering options from within the
cloud service, before it is sent to the customer, to reduce network and customer internal
processing impacts. Additional considerations include the level of analysis or correlation
performed by the CSP and the cloud tenant to identify possible incidents prior to forensics. If
analysis is performed at the CSP, the escalation and hand-off points for the incident
investigation must be determined.
2.2.2 Forensic and other Investigative Support
Although still immature, efforts are already underway within the forensic community to
develop the tools and protocols to collect and examine forensic images derived from
virtualized environments. It is important that the customer understand their own forensic
requirements, research what the vendor may have for meeting those requirements, and
address any gaps.
Copyright © 2011 Cloud Security Alliance
CSA Guidance Version 3
The IaaS cloud customer should request that the vendor provide access to virtual images
through such mechanisms as VM snapshots or VM introspection. On the customer side, the
capability to stand up a
To greatly facilitate detailed offline analyses, look for cloud providers with the ability to
deliver snapshots of the customer’s entire virtual environment – firewalls, network
(switches), systems, applications, and data. Also, providers that can use their management
backplane/systems to scope an incident and identify only those nodes that are under attack
can greatly enhance the response.
The organization’s IR team should familiarize themselves with information tools the cloud
vendor provides to assist the operations and IR processes of their customers. Knowledge base
articles, FAQs, incident diagnosis matrices, etc. can help fill the experience gap a cloud
customer will have with regard to the cloud infrastructure and its' operating norms. This
information may assist the IR team in discriminating operational issues from true security
events and incidents.
2.2.3 Communications during an Incident
Standards exist to communicate incident information for the purpose of sharing indicators of
compromise or to actively engage another party in an investigation. The standards were
developed in the Internet Engineering Task Force (IETF) and are also incorporated in the
International Telecommunication Union’s (ITU) Cyber Security Exchange (CYBEX) project. The
Incident Object Description Exchange Format (IODEF) in RFC 5070 provides a standard XML
schema used to describe an incident and Real-time Inter-network Defense (RID) in RFC 6045
and RFC 6046 describe a standard method to communicate the incident information between
entities, which includes a CSP and tenant.
Parties should consider the means by which sensitive information is transmitted between
parties to ensure that out-of-band channels are available and that encryption schemes are
used to ensure integrity and authenticity of information.
2.3. Containment, Eradication, and Recovery
As with the other phases of Incident Response, close coordination with all stakeholders is
required to ensure that strategies developed to contain, eradicate, and recover from an
incident are effective, efficient, and take into consideration all legal and privacy implications.
The options must be also consistent with business goals and seek to minimize disruption to
service. This is considerably more challenging when multiple organizations are at the table.
At the technical level, options for this phase will differ depending upon the deployment and
service model, and also the layer of the stack at which the attack was targeted. There may be
Copyright © 2011 Cloud Security Alliance
CSA Guidance Version 3
multiple strategies that can be employed, possibly by different entities equipped with
different technological solutions. If at all possible, thought exercises should be conducted in
the preparation phase to anticipate these scenarios and a conflict resolution process
identified. Once the “owner” (or owners) of a particular containment or eradication strategy
is identified, that owner must verify implementation of the strategy. There may be multiple
steps to the strategy that rely on coordinated timing to be successful.
Consumers of IaaS are primarily responsible for the containment, eradication and recovery
from incidents; however providers may help to assist with certain categories of attack, such
as a Denial of Service. The extent to, and conditions under which, facilities at the provider
will be made available to the customer to assist in responding to an attack should be
identified in the preparation phase.
The situation is more complicated for SaaS and PaaS deployments. Organizations are advised
to investigate the facilities offered by their providers to contain, eradicate and recover from
an incident. Consumers may have little (technical) ability to contain an incident in a SaaS and
PaaS services other than closing down user access and inspecting their data as hosted within
the service prior to a later re-opening – as in traditional deployments such a decision must be
based on the business impact of losing the service weighed against the business impact of the
service being corrupted. Furthermore, SaaS and PaaS consumers are reliant upon their CSPs
to provide timely fixes to flaws affecting their code prior to being able to resume service.
Customers must also consider how their provider will handle incidents affecting the provider
itself or affecting other tenants on a shared platform in addition to incidents that are directly
targeted at their own organization.
Cloud deployments may have some benefits in this phase – for example, if there are issues
with a service running on a particular IaaS cloud then the customer may have the option of
moving the service on to another cloud, especially if they have implemented one of the metacloud management solutions. As discussed in the introduction, the relative ease with which
nodes can be shut down and new instances brought up may help to minimize service
interruption when a code fix needs to be deployed. Smaller enterprises may benefit from the
economies of scale which allow for more expensive mitigation technologies, such a DoS
protection, to be extended to their sites.
In a cloud environment there may be many system images with identical or similar
vulnerabilities that can allow an exploit to propagate beyond the initial entry point. There will
need to be a determination that further intrusion didn't take place, and/or the exploit wasn't
used on other instances in the cloud.
The impacted images must be identified and isolated to prevent propagation and analyzed to
determine how the attack took place. If the hypervisor layer is intact, the cloud environment
has an advantage because of the ability to rapidly create copies for analysis. Network
Copyright © 2011 Cloud Security Alliance
CSA Guidance Version 3
isolation is often also required to triage the environment, and this is a weaker area for the
cloud because of the challenge of inserting network monitoring between virtual systems.
Once the attack has been identified, the impacted systems can be isolated by network
filtering, altering the impacted code, or pausing the images. For applications with the ability
to perform failover or execute a DR plan their functions can be moved to unaffected systems.
Simultaneously the historical extent of the incident is determined so that known good copies
of the system and data can be identified for the recovery phase.
Recovery efforts should include robust verification that the root cause has been identified
and remediated prior to the application coming back on-line. This is crucial to avoiding a
“race condition” where an un-identified vulnerability allows the attacker to compromise
newly provisioned nodes. This may require recreating a base image or restoring a known
good backup and applying the mitigation. For attacks targeted lower in the stack, the
“owner” of the particular system affected should verify that any configuration errors,
patches, or other remediation efforts have been universally deployed.
Post-recovery, a "Lessons Learned" activity, leading with the Incident Report, takes place. A
detailed Incident Report is generated based on the previous activities, to be shared with
impacted parties. In a cloud environment this includes the cloud provider and related
organizations, in addition to your internal IR team.
The Incident Report should include the timeline of the incident, analysis of the root cause or
vulnerability, actions taken to mitigate problems and restore service, and recommendations
for long-term corrective action.
Corrective actions are likely to be a blend of customer-specific and provider supported, and
the provider Incident Response team should provide a section with their perspective of the
incident and proposed resolution.
After an initial review of the Incident Report by the customer and service provider, joint
discussions should be held to develop and approve a remediation plan.
3.0 Recommendations
Recommendations
o Cloud customers must understand how the CSP defines events of interest vs.
security incidents and what events/incidents the cloud-service provider reports
to the cloud customer in which way. Event/incident reports that are supplied in
an open format (such as the emerging CEE [4], IODEF [5], or IDMEF [6]) can
facilitate the processing of these reports at the customer side.
o Cloud customers must understand the CSP's support for incident analysis,
particularly the nature (content and format) of data the CSP will supply for
analysis purposes and the level of interaction with the CSP's incident response
Copyright © 2011 Cloud Security Alliance
CSA Guidance Version 3
team. In particular, it must be evaluated whether the available data for incident
analysis satisfies legal requirements on forensic investigations that may be
relevant to the cloud customer.
o Especially in case of IaaS, cloud customers should favor CSP's that leverage the
opportunities virtualization offers for forensic analysis and incident recovery
such as access/roll-back to snapshots of virtual environments, virtual-machine
introspection, etc.
o Cloud customers must set up proper communication paths with the CSP that
can be utilized in event of an incident.
o For each cloud service, cloud customers should identify the most relevant
incident classes and prepare strategies for the incident containment,
eradication and recovery incidents; it must be assured that each cloud provider
can deliver the necessary assistance to execute those strategies.
4.0 Requirements
Requirements
 For each cloud-service provider that is used, the approach to detecting and
handling incidents involving resources hosted at that provider must be planned
and described in the enterprise incident response plan.
 The SLA of each cloud-service provider that is used must guarantee the support
for incident handling required for effective execution of the enterprise incident
response plan for each stage of the incident handling process: detection,
analysis, containment, eradication, and recovery.
 Testing will be conducted at least annually. Customers should seek to integrate
their testing procedures with that of their provider (and other partners) to the
greatest extent possible.
Copyright © 2011 Cloud Security Alliance
CSA Guidance Version 3
Bibliography
[1] GRANCE, T., KENT, K., and KIM, B., Computer Security Incident Handling Guide. NIST
Special Publication 800-61
[2] GROBAUER, B. and SCHRECK, T., Towards Incident Handling in the Cloud: Challenges and
Approaches. In Proceedings of the 3. ACM Cloud Computing Security Workshop (CCSW),
Chicago, Illinois, October 2010
[3] REED, J., Following Incidents into the Cloud. SANS Reading Room
[4] FITZGERALD, E., et al., Commen Event Expression (CEE) Overview. Report of the CEE
Editorial Board, 2010
[5] DANYLIW, R., et al., The Incident Object Description Exchange Format, IETF Internet Draft,
2007
[6] DEBAR, H., et al., The Intrusion Detection Message Exchange Format (IDMEF), IETF
Internet Draft, 2007
Copyright © 2011 Cloud Security Alliance
Download