Saves

advertisement
IT2204:
Systems Administration I
7. Device
Management
Managing desktops
 Managing many desktops (workstations)
 Best done by automation for supported platforms,
thus automating the process of managing many PCs.
− Main duties for workstations
□
□
□
−
Loading of system and software
Updating system software and applications
Configuring network parameters
All three must be right
□
□
□
Initial load must be consistent across all machines
Quick updates
2
Network configuration to be managed centrally
Managing many workstations
− Best done by automation for supported platforms
3
Machine life cycle
 Five states and several transitions exist. There is
need to plan for the different states and transitions
4
Machine life cycle
• Machine states
 New
−
A new machine
 Clean
− OS installed, but not yet configured for
environment.
 Configured
− Set up (configured) correctly for the operating
environment.
 Unknown
− Misconfigured, broken, newly discovered,
outdated configuration, etc.
 Off
− Retired computer
5
Machine life cycle
• State transitions
 Build
− Transition from new to clean states.
− Set up hardware and install OS.
 Initialize
− Configure for environment; often part of
build.
 Update
− Install new software.
− Patch old software.
− Change configurations.
6
Machine life cycle
• State transitions
 Entropy
−
Undisciplined/ unmanageable changes to
configurations
Major environment changes
Unexplained problems
−
−
 Debug
− Going back to correct configured state
 Rebuild
−
Machine rebuilding
□ Possibly due to a major OS revision,
□ Drastic changes to be made such that
simple updates make no sense
7
Automated Installations
• Advantages
 Saves time/money
− Boot the computer, then go do something else.
 Ensures consistency
− No chance of entering wrong input during
install.
− Avoids user requests due to mistakes in
configuration
− What works on one desktop, works on all.
 Allows for fast system recovery
− Rebuild system with auto-install vs. slow tapes.8
Automated Installations
 Full automation better than partial one
−
−
Eliminates prompts in installation scripts
Can include complete notification when
complete.
 Partial automation (better than no
none)
−
Needs proper documentation for
consistency
9
Vendor Installations
• Weaknesses
 Need to always reload the OS on new machines
 (you may have your OS preference)
−
−
−
You need to configure the host for your
environment
Eventually you’ll reload the OS on a desktop,
leaving you with two platforms to support: the
vendor OS install and your OS install.
Vendors change their OS images from time to
time, so systems bought today have a different
OS from systems bought a few months ago. 10
Your own Installations
 Little trust to vendor' s pre-installed OS
−
−
−
Makes adding new apps to your clean install easier
as you are installing for your own environment.
No need to contact the vendor for installation
help.
There may be need for special apps or add-ons
You may eventually need a re-install
□ This may be different from that of the
vendor
□ Need to make sure required drivers,
11
software are available
System and Application
Updates
 At times the software we install may get bugs and




•
security loop holes.
New applications are also launched now and again.
We may need to update or upgrade our software.
Updates should be updated too
Automation systems include:
− Solaris autopatch
− Windows
− Linux package updaters eg yum, apt
Updates can be retrieved from the online software
12
update centers for different OS vendors.
System and Application
Updates
• A software update provides bug fixes for
features that aren't working right, minor
software enhancements and sometimes include
new drivers.
• Some software updates are free, and are
sometimes called a patch because the update is
installed over software you're already using and
it isn't a full software installation.
13
System and Application
Updates
• A software upgrade requires a purchase of a
new version of your software, usually at a
lower price than you would pay if you bought
the software for the first time.
• Some companies will offer a free update to
the latest version if you just recently bought
your software, so be sure to register the
software when you install it so you know if
14 you qualify for free upgrades.
System and Application
Updates
• A patch is a software update comprised of code
inserted (or patched) into the code of an executable
program. Typically, a patch is installed into an existing
software program. Patches are often temporary fixes
between full releases of a software package.
• Patches may do any of the following:
15
–
–
–
–
–
Fix a software bug
Install new drivers
Address new security vulnerabilities
Address software stability issues
Upgrade the software
• A patch is designed to update a computer
program or its supporting data, to fix or
improve it. This includes fixing security
vulnerabilities and other bugs, and improving
the usability or performance.
• Though meant to fix problems, poorly
designed patches can sometimes introduce
new problems.
16
Network configuration
 Done over the network, typically using
DHCP
−
−
−
Eliminates time wastage and manual error
More secure – only authorized systems have
access
Centralized control makes updates and
changes a lot easier (e.g. new DNS server)
17
Managing Servers
 Different from workstation
− Serves many users
− Requires reliability and high uptime (a server should
always be on)
− Requires tight security
− Often expected to last longer
− More expensive
− Typically has a different OS configuration from
desktops
− Deployed within a data center
− Often has maintenance contracts
− Has backup systems
18
− Has better remote access
Server HW
 Buy server HW for servers
−
−
−
−
specialized hardware
offers more internal space
offers more CPU performance
Offer high performance I/O (both disk and
network)
− Has more upgrade options
− Can be rack mountable
 For reliability, use known vendors
19
Vendor server product
lines
• The typical vendor has three product
lines:
1. Home
2. Require more internal space
3.
(Business)
Offer high performance I/O (Server)
20
Vendor server product lines
 Home
−
−
−
Absolute cheapest purchase price
Original equipment manufacturer (OEM)
components change often
focuses on being the lowest cost at the
outset. consumers make purchasing
decisions based on price. (Any add-on
features can be considered later and
purchased for a premium cost.)
21
Vendor server product lines
 Require more internal space (Business)
−
−
−
−
Longer life, reduced total cost of ownership
Fewer component changes
concentrate on the total cost of
ownership. Businesses tend to keep their
computers for a longer time than the average
home consumer. Therefore, the manufacturers
have to keep a large pool of spare parts for
maintenance of these computers.
Usually higher quality components than that
of the home line computers.
22
Vendor server product lines
 Offer high performance I/O (Server)
−
−
−
Lowest cost performance metric
Easier to service components and design
The server line tends to focus on the lowest
total cost of ownership. For example, a
server is designed with higher quality
components that will last a great deal of time
longer than the home or business line of
computers. A server is also designed to be
able to process a higher workload than a
home line of computers.
23
Maintenance contracts
 Vendors have variety of service contracts
−
Customer-purchased spare parts get replaced
when they get used up
 How to select a maintenance contract?
− Think of needs
□
□
□
Non-critical hosts: next-day or two-day
response time is likely reasonable, or perhaps
no contract
Large groups of similar hosts: use spares
approach
Controlled model: only use a small set of
distinct technologies so that few spare part kits
24
needed.
Maintenance contracts
 How to select a maintenance contract?
−
Think of needs
□ Critical host: stock failure-prone and
interchangeable parts (power supplies,
hard drives); get same-day contract
□
Large variety of models from same
vendor: sufficiently large sites may opt
for a contract with an on-site technician
25
Data backups
 Servers often have critical data that must be
backed up
−
−
Client data often backed up on server
Think of separate administrative
network
□ Keep bandwidth-hungry backup jobs off of
the production network
□ Provide alternate access during network
problems
□ May require additional NICs, cabling,
switches etc
26
Servers in the data center
 Servers should be located in data centers
−
Data centers provide
□ Proper power (enough power,
conditioned, UPS, maybe generator)
□ Fire protection/suppression
□ Networking
□ Sufficient air conditioning (climate
controlled)
□ Physical security
27
Remote administration
 Data centers are expensive and may be distant from
admin office
 Servers should not require physical presence at a
console
−
−
−
Typical solution is a console server
□
Eliminate need for keyboard and screen
□
Can see booting, can send special keystrokes
□
Access to console server can be remote (e.g.,
ssh, rdesktop)
Power cycling provided by remote-access power-strips
28
Media insertion & hardware servicing are still problems
RAID
Redundant Array of Independent
Disks
RAID
 Redundant array of independent disks RAID
− A system whereby two or more disks
are physically linked together to form a
single logical, large capacity storage
device that offers a number of
advantages over conventional hard disk
storage devices
− Makes many smaller disks appear as one
30
large disk to a server
Why RAID?
 Superior performance and system reliability
 Improved resilience through increased
redundancy
 Lower costs
31
Why RAID? Performance
 The parallelism or ability to access multiple disks in
the same time allows for the data to be written or
read from an array in a faster way than what would be
possible in a simple single drive.
−
Typically used in large file servers, transaction or
application servers, where data accessibility is critical,
and fault tolerance is required.
−
Today, RAID is also being used in desktop
environments for CAD, multimedia editing and
playback where higher transfer rates are needed.
 Performance is increased because the server has
more "spindles" to read from or write to when data is
32
accessed from a drive
Why RAID? More resilience
 This provided allowance for a backup of data
in the storage array during failure.
 The failure of any array can be prevented by
swapping out a new drive without turning
the system off.
 The RAID performance depends on the
number of drives used in the array.
33
Why RAID? More resilience
 Redundancy is achieved by either writing the
same data to multiple drives (mirroring), or
collecting data (parity data) across the array,
such that the failure of one or more disks in
the array does not result in data loss.
 A failed disk may be replaced by a new one,
and the lost data reconstructed from the
remaining data and the parity data.
34
Why RAID? Cheaper
 Since the main principle involved in the RAID is
to provide greater or the same storage capacity
to a system in comparison with that for a single
drive, there is a high price difference.
 When self repairing configurations are used that
do not need humans to replace drives, storage
could become cheaper and more liable.
35
RAID Techniques
• Two principal techniques employed:
1. Mirroring
− The first implementation of RAID, typically
requiring two individual drives of similar
capacity. One drive is the active drive and
the secondary drive is the mirror
−
The technique provides a simple form of
redundancy for data by automatically writing
data to the mirror drive when it is written
36
to the active drive.
RAID Techniques
• Two principal techniques employed:
2. Striping
− This technique provides increased
performance. It is a method of mapping data
across the physical drives in an array to
create a large virtual drive.
−
The data is subdivided into consecutive
segments or stripes that are written
sequentially across the drives in the array.
37
RAID levels
Several levels of RAID are used
 RAID 0 – “Disk Striping”
− It is technically not a RAID level since it provides
no fault tolerance. Data is written in blocks
across multiple drives, so one drive can be
writing or reading a block while the next is
seeking the next block.
38
RAID 0: Striping
39
RAID levels
−
−
−
Advantages: higher access rate, and full
utilization of the array capacity, easy to
implement.
Disadvantage: there is no fault tolerance - if
one drive fails, the entire contents of the array
become inaccessible.
Ideal for non-critical systems.
40
RAID levels: RAID 1
 RAID 1 – “Disk Mirroring”
−
−
Provides redundancy by storing data twice.
Data is written to both the data disk (active)
and a mirror disk(s).
If one fails, the controller uses either the
data drive or the mirror drive for data
recovery and continues operation.
41
RAID 1: Mirroring
42
RAID levels: RAID 1
−
−
−
Advantage: it provides the best protection of
data since the array management software will
simply direct all application requests to surviving
disk when one disk fails.
Disadvantages: no improvement in data access
speed, and higher cost, since twice the number of
drives is required.
Ideal for critical mission
systems
43
RAID levels: RAID 3
 RAID 3
− Data blocks are subdivided (striped) and
written in parallel on two or more drives. An
additional drive stores parity information for
error correction/recovery . A minimum of 3
disks is needed for a RAID 3 array.
− Since parity is used, a RAID 3 stripe set can
withstand a single disk failure without losing
data or access to data.
44
RAID 3
45
RAID levels: RAID 3
− Advantages: it provides high throughput (both read
and write) for large data transfers; and disk failures
do not significantly slow down throughput.
− Disadvantages: this technology is fairly complex
and too resource intensive to be done in software;
performance is slower for random, small I/O
operations.
46
RAID levels: RAID 5
 RAID 5 – “Data striping with parity”
− The most common secure RAID level.
− Similar to RAID-3 except that data are transferred to
disks by independent read and write operations (not
in parallel).
− The written data chunks are also larger.
− Instead of a dedicated parity disk, parity information is
spread across all the drives. A minimum of 3 disks is
needed for a RAID 5 array.
47
RAID 5 – “Data striping with
parity”
48
RAID levels: RAID 5
 RAID 5 – “Data striping with parity”
−
A RAID 5 array can withstand a single disk
failure without losing data or access to data.
Although RAID 5 can be achieved in software, a
hardware controller is recommended. Often
extra cache memory is used on these
controllers to improve the write performance.
49
RAID levels: RAID 5
−
−
−
−
Advantages: read data transactions are very
fast while write data transaction are somewhat
slower (due to the parity that has to be
calculated).
Disadvantages: Disk failures have an effect on
throughput, although this is still acceptable; like
RAID 3, this is complex technology.
RAID 5 is a good all-round system that
combines efficient storage with excellent
security and decent performance.
It is ideal for file and application servers.
50
RAID levels
•
RAID 0 and RAID 1 combinations

Combines the advantages (and
disadvantages) of RAID 0 and RAID 1 in
one single system.
−
It provides security by mirroring all data on
a secondary set of disks (disk 3 and 4 in the
drawing) while using striping across each set
of disks to speed up data transfers.
51
52
RAID levels
•
RAID 0 and RAID 1 combinations

RAID 1+0 (or 10) is a mirrored data set (RAID 1)
which is then striped (RAID 0), hence the "1+0" name. A
RAID 10 array requires a minimum of two drives, but is
more commonly implemented with 4 drives to take
advantage of speed benefits.

RAID 0+1 (or 01) is a striped data set (RAID 0) which
is then mirrored (RAID 1). A RAID 0+1 array requires a
minimum of four drives: two to hold the striped data,
plus another two to mirror the first pair.
53
RAID Implementations
 Data distribution across multiple drives can
be managed either by dedicated hardware or
by software.
 When done in software the software may be
part of the OS or it may be part of the
firmware and drivers supplied with the card.
55
RAID Implementations
• Operating system based (Software RAID)
• Software implementations are now provided
by many OSs.
• A software layer sits above the disk device
drivers and provides an abstraction layer
between the logical drives (RAIDs) and
physical drives.
• Most common levels are RAID 0 and RAID
1, followed by RAID 1+0, RAID 0+1, and
RAID 5 are supported.
56
RAID Implementations
• Software RAID
 Apple's Mac OS X Server supports RAID 0, RAID 1,
RAID 5 and RAID 1+0.
 FreeBSD supports RAID 0, RAID 1, RAID 3, and
RAID 5 and all layerings of the above via GEOM
(main storage framework for FreeBSD OS) modules,
as well as supporting RAID 0, RAID 1, RAID-Z, and
RAID-Z2 (similar to RAID 5 and RAID 6
respectively), plus nested combinations of those via
ZFS a Suns file system and logical volume manager .
 Linux supports RAID 0, RAID 1, RAID 4, RAID 5,
RAID 6 and all layerings of the above.
57
RAID Implementations
• Software RAID
 Microsoft's server OSs support 3 RAID levels; RAID 0,
RAID 1, and RAID 5. Some of the Microsoft desktop OSs
support RAID such as Windows XP Professional which
supports RAID level 0 in addition to spanning multiple
disks but only if using dynamic disks and volumes.
Windows XP supports RAID 0, 1, and 5 with a simple file
patch. RAID functionality in Windows is slower than
hardware RAID, but allows a RAID array to be moved to
another machine with no compatibility issues.
 NetBSD supports RAID 0, RAID 1, RAID 4 and RAID 5
(and any nested combination of those like 1+0) via its
software implementation, named RAIDframe.
58
RAID Implementations
• Software RAID
 OpenBSD aims to support RAID 0, RAID 1, RAID 4 and
RAID 5 via its software implementation softraid.
 OpenSolaris and Solaris 10 supports RAID 0, RAID 1, RAID
5 (or the similar “RAID Z” found only on ZFS), and RAID 6
(and any nested combination of those like 1+0) via ZFS and
now has the ability to boot from a ZFS volume on both x86
and UltraSPARC a Suns microprocessor.
− Through SVM, Solaris 10 and earlier versions
support RAID 1 for the boot filesystem, and
adds RAID 0 and RAID 5 support (and various
59
nested combinations) for data drives.
RAID Implementations
• Hardware RAID
 Hardware RAID controllers use different, proprietary
disk layouts, so it is not usually possible to span
controllers from different manufacturers.
 They do not require processor resources, the BIOS
can boot from them, and tighter integration with the
device driver may offer better error handling.
60
RAID Implementations
• Hardware RAID
 A hardware implementation of RAID requires at least
a special-purpose RAID controller. On a desktop
system this may be a PCI expansion card, PCI-e
expansion card or built into the motherboard.
Controllers supporting most types of drive may be
used – IDE/ATA, SATA, SCSI, SSA, Fibre Channel,
sometimes even a combination.
 The controller and disks may be in a stand-alone disk
enclosure, rather than inside a computer.
 The enclosure may be directly attached to a
computer, or connected via SAN.
61
RAID Implementations
• Hardware RAID
 Most hardware implementations provide a read/write
cache, which, depending on the I/O workload, will
improve performance. In most systems the write
cache is non-volatile (battery-protected), so pending
writes are not lost on a power failure.
 Hardware implementations provide guaranteed
performance, add no overhead to the local CPU
complex and can support many operating systems, as
the controller simply presents a logical disk to the
operating system.
62
RAID Implementations
• Reading Assignment
 What are the advantages and disadvantages of SW
RAID?
 What are the advantages and disadvantages of HW
RAID?
63
RAID and Backup!
•

RAID is no substitute for back-up!
All RAID levels except RAID 0 offer protection from
a single drive failure.
 A RAID 6 system even survives 2 disks dying
simultaneously. For complete security there is need to
back-up the data from a RAID system.
− A back-up comes in handy if all drives fail
simultaneously because of a power spike.
− It is a safeguard if the storage system gets stolen.
− Back-ups can be kept off-site at a different location.
This can come in handy if a natural disaster or fire
64
destroys your workplace.
RAID and Backup!
• RAID is no substitute for back-up!
−
The most important reason to back-up
multiple generations of data is user error. If
someone accidentally deletes some important
data and this goes unnoticed for several hours,
days or weeks, a good set of back-ups ensure
you can still retrieve those files.
−
**Read about existing backup strategies
65
Redundant Power supplies
• Power supplies are the 2nd most failureprone part
• Ideally, servers should have RPSs
–
–
–
The server will still operate if one power
supply fails
Should have separate power cords
Should draw power from different sources
(e.g., separate UPS)
66
Hot-swap components
• Redundant components should be hotswappable
New components can be added without
downtime
–
Failed components can be replaced without
outage
• Hot-swap components increases cost
–
–
But consider cost of downtime
• Always check
–
–
–
Does OS fully support hot-swapping
components?
What parts are not hot-swappable?
67
How long/severe is the service interruption?
But Servers are
expensive !
• Is there an alternative?
• Server appliances
–
–
Dedicated-purpose, already optimized
Examples: file servers, web servers, email,
DNS, routers, etc.
• Many inexpensive workstations
–
–
–
Common approach for web services
□
Google, Hotmail, Yahoo, etc.
Use full redundancy to counter unreliability
Can be useful (but need to consider total costs, e.g.,
68
support and maintenance, not just purchase price)
Managing Services
• Services distinguish a structured computing
environment from a bunch of standalone computers
• Larger groups are typically linked by shared services
that ease communication and optimize resources
• Typical environments have many services
–
–
DNS, email, authentication, networking,
printing
Remote access, license servers, DHCP,
software repositories, backup services,
Internet access, file service
69
Managing Services
• Providing a service means
–
–
–
–
Not just putting together hardware and
software
Making service reliable
Scaling the service
Monitoring, maintaining, and supporting the
service
70
Provide Good Solid Services
• Get customer requirements
–
Reason for service
□
□
□
□
–
Define a service level agreement (SLA)
□
□
□
–
How service will be used
Features needed vs. desired
Level of reliability required
Justifies budget level
Enumerates services
Defines level of support provided
Response time commitments for various kinds of
problems
Estimate satisfaction from demos or
small usability trials
71
Provide Good Solid Services
• Get operational requirements
– What other services does it depend on?
□
Only services/systems built to same standards or higher
□
Integration with existing authentication or directory
services?
–
How will the service be administered?
–
Will the service scale for growth in usage or data?
–
How is it upgraded? Will it require touching each desktop?
–
Consider high-availability or redundant hardware
–
Consider network impact and performance for remote users
• Revisit budget after considering operational concerns72
Provide Good Solid Services
• Consider an open architecture
– e.g. open protocols and open file formats
– Proprietary protocols and formats can be changed,
may cause dependent systems/vendors to become
incompatible
– Beware of vendors who “embrace and extend” so
that claims can be made for standards support,
while not providing customer interoperability
– Open protocols allow different parties to select
client vs. server portions separately
– Open protocols change slowly, typically in upward
compatible ways, giving maximum product choice
– No need for protocol gateways (another
73
system/service)
Provide Good Solid Services
• Favor simplicity over complexity
KISS
Keep it Simple Stupid
–
Simple systems are more reliable, easier to maintain, and
less expensive
–
Typically a features vs. reliability trade-off
Take advantage of vendor relationships
–
–
–
–
Provide recommendations for standard services
Let multiple vendors compete for your business
Understand where the product is going
Attempt to favor vendors who develop natively on your
platform (not port to it)
74
Provide Good Solid Services
•
Machine independence
–
–
–
•
Clients should access services using generic names
□
e.g., www, calendar, pop, imap, etc.
Moving services to different machines becomes
invisible to users
Consider (at the start) what it will take to move the
service to a new machine
Supportive environment
–
–
Data center provides power, AC, security,
networking
Only rely on systems/services also found in data
center (within protected environment)
75
Provide Good Solid
Services
•
Reliability
–
–
Build on reliable hardware
Exploit redundancy when available
□
–
Plug redundant power supply into different
UPS on different circuit
Components of service should be tightly
coupled
□
Reduce single points of failure
– e.g., all on same power circuit, network
switch, etc.
□
Includes dependent services
– e.g., authentication, authorization, DNS, etc. 76
Provide Good Solid
Services
•
Reliability
–
–
Make service as simple as possible
Independent services on separate machines,
when possible like having a DNS service
independent of a Mail service.
□ But put multiple parts of single service
together
77
Provide Good Solid Services
•
Restrict access
– Customers should not need physical access to
servers
□
□
•
Fewer people
Eliminate any unnecessary services on server (security)
Centralization and standards
– Building a service = centralizing management of
service
– May be desirable to standardize the service and
centralize within the organization as well
□
□
Makes support easier, reducing training costs
Eliminates redundant resources
78
Provide Good Solid Services
•
Performance
–
–
If a complicated service is deployed, but slow, it
is unsuccessful
Need to build in ability to scale
□
□
–
–
–
Can't afford to build servers for service every year
Need to understand how the service can be split
across multiple machines if needed
Estimate capacity required for production (and
get room for growth)
First impression of user base is very difficult to
correct
When choosing hardware, consider whether
79
service is Disk I/O, memory, or network bound
Provide Good Solid Services
•
Monitoring
–
–
–
•
Helpdesk or front-line support must be
automatically alerted to problems
Customers that notice major problems before
sysadmins are getting poor service
Need to monitor for capacity planning as well
Service roll-out
–
First impressions
□
□
□
Have all documentation available
Helpdesk fully trained
80
Use slow roll-out (helps clients adjust to service)
Q&A
Download