Chapter 6: Assuring Reliable and Secure IT Services

advertisement
Chapter 6: Assuring Reliable and Secure IT Services
The inherent reliability of internetworks:
 The inherent reliability of internetworks is due to US Department of Defense research that led
to technologies robust enough to withstand a military attack.
 The key to this inherent liability is redundancy—the exceptionally large number of potential
paths a message can take between any two points in a network. Messages are routed around
network problems.
However, some components of a firm’s infrastructure are not inherently reliable.
 For example, the reliability of processing systems depends on how they are designed and
managed.
 Reliability through redundancy comes at a price—extra equipment to guard against failures.
How much reliability to build in is a management decision contingent on numerous, mainly
business, factors:
 Tangible factors: direct revenue losses --such as how costly is a 15-minute failure of the order
management system?
 Less tangible factors: how many customers frustrated by the outage will never return?
 How likely is such an event to happen?
Redundant systems are more complex and difficult to manage than nonredundant systems:
 Charles Perrow: failures are inevitable in tightly coupled complex systems.
 Precautions such as adding redundancy create new categories of accidents by adding
complexity.
Malicious threats to computing infrastructure:
 Hackers-from pranksters to organized criminals to international terrorists
 It’s an arms race requiring constantly improving defenses against increasingly sophisticated
weaponry.
 Attacks are often automated and systematic, carried out by wrecking routines that probe for
vulnerabilities and inflict damage randomly.
A. Availability Math




Reliability of computing infrastructure = availability of a specific information technology
service, expressed as a percentage. Ex: 98% availability, which is equivalent to half-hour per
day of downtime.
A business’ tolerance for outages varies by system and situation. Downtime in large chunks of
time is more serious than short-term outages. Predictable downtown is more manageable than
random downtime.
Availability for real-time infrastructure is usually expressed in terms of a “number of nines”—
5 nines means 99.999% availability—less than a second of downtime in a 24-hour day, or no
more than a minute in 3 months.
Service availability is generally lower than the availability of individual components, and it
decreases as components are added in a series.
1. The Availability of Components in Series
 IT service availability degrades severely as components are added in a chain.
 By the time 15 devices are added in a series, downtime exceeds 25%.
 If only one component in the chain fails, 99.999 availability cannot be maintained.
1
2. The Effect of Redundancy on Availability
 Solution: Components connected in parallel in the provision of an IT service.
 Because any of the individual components can support the service, all five must fail at the same
time to render this combination of components a failure.
 Availability increases when components that are 98% available are combined in parallel.
B. High-Availability Facilities
1. Uninterruptible Electric Power Delivery
 Redundant power is provided to each piece of computing equipment housed in them—e power
cables for each computer.
 Power distribution inside the facility is fully redundant and includes uninterruptible power
supplies (UPSs) to maintain power even if power delivery to the facility is interrupted. UPSs
can employ batteryless, flywheel-based technologies.
 Connections to outside sources of power are redundant—facilities access 2 utility power grids.
 Diesel generators stand by for backup power generation, and on-site fuel tanks contain fuel for
a day or more of operation.
 Plans are in place for high-priority access to additional fuel in case of a lengthy primary power
outage (e.g., delivery helicopter)
 High-end data centers may obtain primary power from on-site power plants, with first-level
backup from local utility power grids and second-level backup from diesel generators.
2. Physical Security
 Security guards in bulletproof enclaves protect points of entry and patrol facilities regularly..
 Closed-circuit tv monitors critical infrastructure and provides immediate visibility into any area
of the facility from a security desk.
 Access to internal area requires photo ID and presence on a prearranged list.
 Entry is through single-person buffer zone with integrated metal and explosive detection that
can be locked down.
 Motion sensors and biometric scanners (retinal, palm, voice recognition) are in place.
 The building that houses the data center is not shared with other businesses.
 Building is “hardened” against external explosions, earthquakes, and other disasters.
3. Climate Control and Fire Suppression
 Redundant heating, ventilating, and air-conditioning equipment that monitors and maintains
suitable temperature conditions.
 Mobile cooling units.
 Integrated fire suppression systems.
4. Network Connectivity
 External connections to Internet backbone providers are redundant., involving at least 2
backbone providers, and enter the building through separate points.
 Agreements are made with backbone providers that permit significant percentages (i.e., 5090%) of network traffic to travel from origin to destination across the backbone company’s
private network avoiding often-congested public Internet junctions.
 A 24X7 network operations center (NOC) is staffed with network engineers who monitor the
connectivity infrastructure of the facility.
2


A redundant NOC on another site is capable of delivering services of equal quality as those
provided by the primary NOC.
24X7 assistance to customers. Automated problem-tracking systems are integrated with similar
systems at service delivery partner sites.
5. N + 1 and N + N Redundancy
 N +1 level of redundancy (99.9%) is the least level that must be maintained for mission-critical
components (for each type of critical component at least one unit is standing by)
 N + N redundancy (99.999)—twice as many mission-critical components as are necessary to
run a facility.
 Facilities are categorized according to the level of uptime they support:
o Level 1 data centers—N + 1 redundancy, are available 99-99.9% of the time.
o Level 2 and 3 data centers—more.
o Level 4 data centers—N + N or better, achieving uptimes of 99.999-99.9999. Downtime
is seconds per year, unnoticeable to most users.
 High levels of availability are costly.
o Increasing the availability of a single web site from 99% to 99.999% costs millions of
dollars.
o A 99.999% availability data center costs 3 or 4 times more than one capable of 99 to
99.9% availability.
o Management decisions about the design of IT infrastructures involve tradeoffs between
availability and the expense of additional components.
C. Securing Infrastructure against Malicious Threats


99% of companies/gov’t agencies in a 2003 survey had detected security breaches in the last 12
months.
Who are the hackers? –thrill seekers, those who have a specific grudge with a company, those
seeking a company’s proprietary data, terrorists.
1. Classification of Threats
 External Attacks
o Attacks against computing infrastructure that harm it or degrade its services without
actually gaining access to it.
o DoS attacks disable infrastructure devices by flooding them with an overwhelming
number of messages that the computer cannot handle.
o Hackers send packets that originate from multiple locations on the Internet or that appear
to originate from multiple locations.
o Distributed denial of service (DDoS) attacks are carried out by automated routines
secretly deposited on Internet-connected computers whose owners have not secured them
against intrusion (this includes a large % of DSL and cable modem-connected PCs).
Spoofing occurs when hackers provide packets with false origin addresses that mislead
filtering software at a target site.
o Anyone can hack: DoS attack routines can be downloaded from the Internet and are as
easy to use as email. DDoS and spoofing attacks are a little more complicated, but
require no technical expertise.
o DoS attacks are very difficult to defend against. It is relatively simple for attackers to
cary their patterns of attack and make them like legitimate ecommerce traffic.
3


o DoS attacks do not cause outages, but they affect infrastructure performance, waste
company resources, and reduce customer satisfaction.
Intrusion
o Intrusion attacks gain access to a company’s internal IT infrastructure:
 By obtaining user names and passwords. People tend to use the same user
name/password in multiple systems. Social engineering is used to get people to
divulge privilege info, such as over the phone.
 By acquiring passwords by eavesdropping on network conversations with
“sniffer” software.
 By exploiting vulnerabilities left in software when it was developed to gain access
to systems without first obtaining passwords.
 Computers are “port scanned” within a few minutes of connecting to the Internet.
 Hackers can also use automated routines that systematically scan IP addresses and
report back to their masters which addresses contain exploitable vulnerabilities.
o Once inside, intruders have the same rights of access and control over systems and
resources as legitimate users.
 They can steal info, erase/alter data, deface web sites, pose as company rep,
deposit time bombs.
 It’s very difficult to figure out what an intruder may have done.
Viruses and Worms
o Malicious software programs that replicate themselves to other computers.
o Distinguished by their degree of automation and ability to replicate across networks.
o Viruses require assistance (often inadvertent) from users to replicate and propagate (e.g.,
opening a file attached to an email message or even opening a web page) whereas worms
replicate and move across networks automatically.
o Danger: they can incorporate and automate other types of attacks, like DoSs.
2. Defensive Measures
 Security Policies—company policy should specify what is appropriate and inappropriate:
o What kinds of passwords are to be used, and how often should they be changed?
o Who is allowed to have accounts on company systems?
o What security features must be activated before a company can connect to a network?
o What services are allowed to operate inside a company’s network?
o What are users allowed to download?
o How is the security policy enforced?
 Firewalls
o A collection of hardware/software designed to prevent unauthorized access to a
company’s internal computer resources.
o Located at points of maximum leverage within a network, typically at the point of
connection between a company’s internal network and the external public network.
o Some work by filtering packets coming from outside the company before passing them
along to computers inside the company’s facilities.
o Others use a sentry computer that relays info between internal and external computers
without allowing external packets direct entry.
o They are excellent points at which to collect data about traffic moving inside and outside
networks.
o They can be used to divide an internal network into segments, so that an intruder that
penetrates one part cannot access the rest.
o They conceal internal network configurations from external prying.
4
o Limitations of firewalls: provide no defense against malicious insiders or against activity
that does not traverse the firewall (ex: traffic that enters a network via an unauthorized
dial-up modem behind the firewall).
o Authentication
o the variety of techniques and software used to control who accesses elements of
computing infrastructures.
o Host authentication controls access to specific computers (hosts)
o Network authentication controls access to regions of a network.
o Both types are used together.
o Strong authentication—passwords expire regularly and forms of passwords are restricted
to make them harder to guess. User name/password plus one other factor, such as
certificate authentication, or biometric verification of identity.
 Encryption
o Render the contents of electronic transmissions unreadable by anyone who might
intercept them.
o Legitimate recipients can decrypt transmission contents by using a piece of data called a
key.
o Key must be kept secret and protected from social engineering, physical theft, insecure
transmission.
o By setting up encryption at both ends of a connection across public networks, a company
can extend its secure private network, creating a virtual private network. Ex: Publicprivate key encryption—one unique key (public key) is used to transform a plan text
message into encrypted form, and a different one (a private key) is used to decrypt the
message back into plain text at its destination.
o Limitations of encryption: hackers can still gain useful info from the pattern of
transmission, message lengths, origin, or destination address. Hackers can still intercept
and change data in a transmission.
 Patching and Change Management
o Keeping track of the variety of systems in a company’s infrastructure their security
weaknesses, the available patches, and whether they have been applied is very important.
o Best practice calls for keeping detailed records of all files that are supposed to be on
production computers.
o However, in many cases shortcuts are done on formal change management procedures,
resulting in a gap in formal knowledge about what files and programs should be present
on company systems.
 Intrusion Detection and Network Monitoring
o Help network administrators recognize when infrastructure is or has been under attack.
o Network monitoring automatically filters out external attack traffic at company network
boundaries.
o Intrusion detection systems include combinations of hardware probes and software
diagnostic systems that log activity throughout company networks and high patterns of
suspicious activity.
o They provide information that can help to reconstruct exactly what an intruder did.
3. A Security Management Framework—principles of security management
 Make deliberate security decisions
 Consider security a moving target
 Practice disciplined change management
 Educate users
5

Deploy multilevel technical measures, as many as are affordable
4. Risk Management of Availability and Security
 Risks must be characterized and addressed in proportion to their likelihood and potential
consequences.
 Management actions to mitigate risks must be prioritized according to costs and potential
benefits.
 One method of prioritizing involves computing the expected loss associate with incidents by
multiplying the probability of an incident and its cost if it occurs.
 The logic of risk management can be very complex. Sometimes, managers address high-cost
risks first, even though their likelihood of occurrence is very low.
 Intangible aspects of risk are also hard to define.
 Risk is often also a factor in acquiring new technology, since it can affect security and
availability.
5. Incident Management and Disaster Recovery—steps (described in 6., 7., and 8.) that need to be
taken before, during, and after an incident.
6. Managing Incidents before They Occur
 Sound infrastructure design
 Disciplined execution of operating procedures
 Careful documentation
 Established crisis management procedures
 Rehearsing incident response
7. Managing during an Incident
Psychological obstacles humans have to deal with in crises:
 Emotional responses, including confusion, denial, fear, and panic.
 Wishful thinking and groupthink
 Political maneuvering, diving for cover, and ducking responsibility
 Leaping to conclusions and blindness to evidence that contradicts current beliefs
 Public relations inhibition—managers don’t want to admit the seriousness of a problem
8. Managing after an Incident
 After an incident, infrastructure managers may need to rebuild parts of the infrastructure.
 Well-documented configurations and procedures are necessary for recovery.
Questions for Discussion:
1. Why are internetworks inherently reliable? Why are some components of a firm’s infrastructure
not necessarily reliable?
2. Discuss the availability of components in a series and the effect of redundancy on availability.
3. Discuss the steps companies must take to ensure that their facilities are high-availability.
4. Discuss N + 1 and N + N redundancy and the way facilities are categorized according to the level
of uptime they support. Include the costliness of high levels of availability.
5. Discuss the three major categories of threats to computer infrastructure, the dangers of each, and
why they are hard to prevent.
6. Describe how security policies, firewalls, authentication, and encryption are used as security
measures.
6
7. Describe security measures that should be taken before, during, and after a security breach
incident.
7
Download