SYSTEM ADMINISTRATION Chapter 15 Network Integrity

advertisement
SYSTEM ADMINISTRATION
Chapter 15
Network Integrity
Network Integrity
• The definition of network integrity is maintaining the
state of the network such that all parts function as a
whole in a sound and unimpaired state.
• The areas that must be included in a plan to
maintain network integrity are:
– Documentation
– Disaster planning/recovery
– Fault tolerance
– System backup
Documentation
• Documentation for the network includes information on
the following:
–
–
–
–
–
–
–
–
–
–
–
LAN/WAN topology
Hardware inventory
Software inventory
Change logs
Server information
Router and switch configurations
User policies and profiles
Baseline documents
Mission-critical applications and hardware
Network service configuration
Procedures
(continued)
Documentation
(continued)
• Good network documentation aids in
troubleshooting problems that occur within the
network, such as failed connections, failed servers,
hung applications, WAN connection failure, and user
resource access failure.
• Documentation can be formalized using custom
forms or informally kept using inexpensive
notebooks to record change and repair events.
Disaster Planning
• A disaster is an event that causes widespread
destruction or distress, or total failure.
• Planning for the worst-case scenario allows the
network administrator, planning team, and
technicians to anticipate the consequences of both
natural and man-made disasters.
• Disaster planning may be as simple as writing a
procedure to back up all data, or as complex as
contracting for remote hot-sites with 100% uptime so
that a network could sustain a disaster without loss
of service.
Disaster Recovery Plan
• A disaster recovery plan follows a set of well-defined
steps:
– Creation of the disaster recovery (DR) team
– Identifying the risks and vulnerabilities that threaten
the network.
– Business impact assessment
– Definition of needs
– Detailed plan development
– Testing
– Maintenance of the plan
• A disaster recovery plan is a living document that may
take many weeks or months to develop and implement,
and this plan must be consistently updated as changes
are made to the network.
Mirrored Servers
(Failover Clustering)
• Mirrored servers provide 99.9% uptime for missioncritical applications and data.
• To build mirrored servers, both servers must be
configured with identical equipment and software, and
both must be attached to the network.
• The “primary” mirrored server answers requests from the
network, and issues a “heartbeat” to its twin to let the
secondary mirrored server know that the primary is
servicing the network.
• When the secondary server does not hear a heartbeat in
a predetermined time frame, it will begin answering
requests from the network. The window between failure
of the primary server and “cutover” to the secondary is
usually 30 to 45 seconds.
Clustered Servers
• Clustered servers represent 2 or more servers that
are configured with identical applications and file
structures, all attached to the network, and all
answering requests from the network. All servers are
acting as one very large server.
• Clustered servers can make use of replication
services. Several servers may be located off-site
and participate in replication to assure that data is
identical on all servers in the event of failure within
the network or disaster.
• Clustering is very expensive to implement and is a
complex implementation. For this reason, small- to
medium-sized businesses usually do not choose this
option for disaster planning and recovery.
Power Protection
• Power loss is one of the small disasters that an
administrator can mitigate without undue expense or
complex configurations.
• Several types of power protection can be used in
the network. The choice will depend on the nature of
the operations and the stability of the geographical
location of the business.
Surge Protectors
• Surge protectors are designed to minimize the
effects of power spikes, surges, and brownouts.
• Surge protectors do not protect equipment from
“dirty power,” noise on the line, or power failure.
• Over time, the circuit breaker in a surge protector
loses its sensitivity to power fluctuations and can
allow great variation in power to pass through to
components, weakening components.
• Surge protectors should be replaced at least yearly
on equipment to reduce the likelihood of component
damage.
Online Uninterruptible Power
Supply (UPS)
• The purpose of a UPS is to provide enough power
for enough time to allow a server or other critical
machine the ability to shut down gracefully.
• An online UPS provides protection for equipment by
conditioning the power before it reaches the
equipment.
• Inside the UPS is a battery that stores power coming
from a wall outlet. That power is then sent to the
equipment. All noise and fluctuation is minimized,
thus making the power used by the server “clean”
again, and providing a power source should there
be a loss of power.
(continued)
Online UPS
(continued)
• The size of the UPS depends on the wattage of the
attached equipment. 1 watt = 1.4 VA. Calculate the
wattage of the equipment, multiply it by 1.4, and
determine the length of time necessary to complete
the shutdown process and any other routines that
must be done while the machine is still running.
• Most UPSs will provide 15-20 minutes worth of
power by default, but if longer times are needed,
then the total wattage must be multiplied by the
amount of time (above 15-20 minutes) to determine
the size of the UPS.
Standby UPS
• A standby UPS allows power to go directly to the
equipment while charging a battery in the UPS.
When a power failure occurs, the UPS detects a
reduction in power and cuts over to battery power.
• Some devices, such as servers, may reboot or shut
down during a short gap between loss of power and
cutover to battery backup.
Fault Tolerance
• Fault tolerance is the system’s capacity to continue
functioning given a “fault” or malfunction of one or
more components.
Disk Fault Tolerance
• Disk fault tolerance provides the network with the
ability to recover from loss of function of a hard disk
storage device, and to prevent loss of data stored
on that device.
• One of several disk fault tolerance strategies can be
implemented in the servers to protect the data. The
most common is some form of redundant array of
inexpensive disks (RAID).
RAID Level 0
• RAID level 0 is commonly called disk striping without
parity.
• This form of RAID allows data to be written across
multiple disks, but does not provide any fault
tolerance.
• RAID 0 requires at least two hard disks to
implement.
• With RAID 0, both read and write performance will
improve over single disk usage.
• RAID 0 uses all available disk space for storage.
• This form of RAID is useful for noncritical data that is
routinely backed up.
RAID Level 1
• RAID level 1 is commonly referred to as disk
mirroring (or disk duplexing when two controllers are
used).
• With RAID 1, data is written to both disks at the
same time. Should one disk fail, the other disk takes
over servicing requests from the network.
• RAID level 1 requires two disks to implement.
• Mirroring/duplexing will provide good read and write
access to data on the disk.
• Only 50% of the total disk space can be used for
storage.
• This form of RAID is used where fault tolerance is
needed, but cost is of a concern.
RAID Level 2
• RAID level 2 is known as bit-level striping with
Hamming code ECC.
• This level of RAID is not used in modern systems.
RAID Level 3
• RAID level 3 uses byte-level striping with dedicated
parity.
• Data is striped across multiple drives and a parity bit
is written to a dedicated hard disk for recovery of
lost data.
• Read performance with RAID 3 is good, but write
performance is only poor to fair.
• This type of RAID is costly to implement and is not
as efficient as other implementations.
RAID Level 4
• RAID level 4 uses a method called block-level
striping with dedicated parity.
• The difference between RAID 3 and RAID 4 is
simply that 4 uses blocks of a size determined by
the administrator and 3 uses a stripe at the bit level.
• Read performance is good and write performance is
fair.
• This type of RAID is a midline between 3 and 5, but
is not frequently implemented.
RAID Level 5
• RAID level 5 is commonly known as striping with
parity.
• This form of RAID requires at least 3 disks. Data is
striped across the disks, and a parity bit is written to
the disk as well. This is not a dedicated parity disks
system.
• Read performance is very good, while write
performance is fair.
• When figuring available storage space, add the
amount of disk space on all drives and subtract the
amount of space on one drive.
• RAID 5 is considered to be the best choice for fault
tolerance and performance.
RAID Level 6
• RAID level 6 uses block-level striping with dual
distributed parity.
• This form of RAID requires a minimum of 4 disks to
implement. The equivalent of two disks are lost to
parity.
• The read performance is good and the write
performance is poor to fair due to the parity bits
written to the drives.
RAID Level 7
• RAID Level 7 is a proprietary form of RAID that uses
an asynchronous cached striping mechanism with
dedicated parity storage.
• Although a defined RAID level, consult the vendor
for more information.
Backups
• When determining a backup strategy, the first two
considerations are how you want to accomplish the
backup (the hardware) and what software you will use to
complete this task.
• Some of the options for backup include:
• Small- and large-capacity removable disks
• Optical discs
• Magnetic tape (the most commonly used)
• Once the medium is identified, the administrator will
determine a schedule of backups using one or more of
the following methods:
• Full backups
• Incremental backups
• Differential backups
Full Backups
• A full backup takes all data and commits it to tape.
• During a full backup, the archive bit (attribute) is
reset to “off” to notify the backup software that the
file has been saved to tape.
• Full backups done on a daily basis allow quick
restore because only one tape will be used to
complete the restore.
Incremental Backups
• Only files that have changed since the last backup
are committed to tape. The last backup may have
been a full, incremental, or differential backup.
• This method of backup is used in conjunction with
weekly full backups.
• Incremental backups reduce the amount of time it
takes to complete the backup process because of
the limited selection of files that are backed up.
• When restoring, use the last full backup and all
subsequent incremental backups.
• Incremental backups reset the archive bit to off.
Differential Backups
• A differential backup saves all files that have
changed since the last full backup.
• To restore, only the tapes from the last full backup
and the last (most recent) differential backup will be
used.
• This method of backing up data is used in
conjunction with weekly full backups.
• A weekly full backup and daily differential backups
are considered the most efficient and safest strategy
for maintaining data integrity.
Other Considerations for
Network Integrity
• Tape rotation patterns are determined when the
backup strategy is designed.
– The choices are:
•
•
•
•
Daily rotation
Weekly rotation
Monthly rotation
Yearly rotation
• With each option, the administrator must consider
what archive of past data must be maintained for the
business, and whether the cost of maintaining a
large archive of tapes outweighs the protection of
the data.
(continued)
Other Considerations
(continued)
• Most businesses use either a weekly rotation or a
monthly rotation to manage archived data.
• Tape storage is important to consider as well.
• Magnetic tape is susceptible to damage from natural
elements including heat, sun, water, and humidity.
• Proper storage for disaster recovery is necessary.
• Tapes should be stored in climate-controlled rooms
that are physically protected or at an off-site storage
facility.
(continued)
Other Considerations
(continued)
• The best option for disaster recovery is to contract
with a third party to maintain the tape archive at a
remote location.
– This method allows the tapes to remain safe
should there be a disaster at the location of the
business.
– Restoration of the data can then take place at the
new location should the old one be rendered
unusable.
Network Attached Storage
(NAS)
• NAS attaches large data storage to the network, but
does not require a server to manage.
• Access to NAS is controlled through file system
permissions.
• NAS can use multiple file formats such as CIFS and
NFS.
• The NAS device is a storage facility, and does not
expend any resources providing any other services
to the network.
• NAS devices can be brought down for maintenance
without causing outages on the network.
Download