sun solaris 10 os

advertisement
Sun Solaris OS
Glenn Barney
gb2174@columbia.edu
COMS E6998.002 : Advanced Computer Design
Metrics
• Sun focused on 5 major design areas
- Performance
- Security
- Prevent
- Detect
- Respond
- Availability
- Utilization
- Platform Choice
- Hardware Compatibility list
- 716 x86/x64 systems, 75 SPARC systems
• Major Metric successes are Security, Availability.
Performance and Utilization are a bit more
questionable… but still very good as we’ll see.
History of Solaris
•
It’s a Unix OS that is an amalgam of earlier Unix based OSs, but mainly SUN’s first
OS, SunOS based on BSD and AT&T’s Unix, System .
•The General timeline :
–1970 to 1979 : Unix is first written and Assembly
and then C by Ritchie and Thompson.
–1982 – Bill Joy leaves Berkeley, co-founds sun and
develops SunOS based on BSD
–1984 to 1987 – AT&T develops releases System V,
which competes with BSD until the mid 90s
–1988 – AT&T Purchases large stake in Sun
–1993 – Sun announces first version of Solaris,
which will no longer be based on BSD but mainly
on System V release 4, an mix of other Unix
distributions. The competing unix standards group,
OSF, begins a GUI war with Sun supporting it’s own
MOTIF/X against Sun’s OPEN LOOK.
–1994 – Sun creates the Common Desktop
Environment to support both MOTIF and OPEN
LOOK - by Solaris 5 it’s officially supported
The Solaris Gestalt
• Pulled in from System V
• Pulled in from BSD
– Virtual memory system
– Fast file system with symbolic links
– TCP/IP networking system with
Kerberos, Telnet, FTP, sendmail.
– Alternate shells to Bourne shell (C
shell)
– Vendor products like NFS (from SUN)
– and symmetric multiprocessing
support, thread management and
shared libraries
– Interprocess Communication
– Bourne shell enhancements
– STREAMS and TLI networking
libraries
– Remote File Sharing
– Improved memory paging
– Application Binary Interface
Created by Sun for the SunOS
•SunOS 4.x
–NFS
–OpenWindows 2.0 GUI
–OpenBoot monitor
–DeskSet Utilities
–Multiprocessing Support
•SunOS 5.x (ie Solaris)
–SMP for more then 100 processors in
single server
–CDE (Motif, PostScript, Open Look)
–Gnome 2.0 to support Linux integration
–Network Information Service (NIS)
–Clustering
–Java
–Ever growing list of new features
Some General Solaris Tidbits
• Solaris 10 does not support old Sun hardware : Chipsets it does
support UltraSPARC II, III, IV and newer, 32 bit Intel x86 and
64-bit AMD Opteron.
• Of course old 32 bit SPARC programs are still supported
• Sun does support batch jobs like JCL : Sun MBM - which
preserves Batch step constructs on Sun systems
• Load balancing seems to require a third party application
• Sun Network Cache and Accelerator (SNCA) since Solaris 8
helps cache and serve web pages, but doesn’t do load balancing
per se
Solaris Overview
•
•
•
•
Processor/Platform Specific code – less then
5% of kernel, developed to adapt to different
hardware platforms
Device Drivers - dynamically loaded and
use a common published interface
File System and Volume Management – treat
large number of disks as single volume,
Virtual File System supports unlimted file
system extensions : UFS, NFS, Sun
StoreEdge file systems, PC file systems, etc.
New Zetabyte File System.
Unified TCP/IP Stack
•Linux System Call Handler is in-kernel, it catches Linux ssytem calls and dispatches the
equivalent Solaris kernel functions
•Dtrace debugging system new for Solaris 10, clean and modular pre-deployed global
debugging solution at minimum runtime cost.
Solaris Modular Kernel
•Seven types of loadable
modules
•Secluding classes
•File systems
•Loadable system calls
•Loaders for executable
file formats
•Stream modules
•Bus or device drivers
•Miscellaneous
•
•
•
•
•
Solaris Kernel
Kernel Thread - core unit of
execution that is scheduled and
executed on a processor.
– have an execution state and
context that includes a global
priority and scheduling class
– units that get scheduled,
executed and context switched
on and off processors
User Thread – user level thread
state maintained within a user
process
Process – executable form of
program
Lightweight Process – LWP
kernel visible execution context
for a user thread
Solaris 2 to 8 had a “two-level
threads model” where many
threads were able to be assigned to
to a smaller group of LWPs.
However the two-level model was replaced with a 1
to 1 model. Why? Basically it was too complicated.
•Improved performance, scalability, and reliability
•Reliable signal behavior
•Improved adaptive mutex lock implementation
•User-level sleep queues for synchronization objects
Kernel Thread Scheduling
•
•
•
•
Dispatcher uses priority model to
select which kernel thread to execute
next.
Supports preemption, and the kernel
itself is preemptable.
170 global priorities partitioned by
scheduling class.
Three main classes are TS, SYS, and
RT.
Timeshare (TS) – default for all process
and kernel threads in the process.
Interactive (IA) – enhanced TS used by
the windowing system to boost threads
under the window focus
Fair Share Scheduling (FSS) – share
based, not priority based.
Fixed Priority (FX) – fixed-priority
System (SYS) – used for kernel threads,
they are bound and run till block or
complete
Real Time (RT) – fixed priority, fixedtime quantum scheduling.
Interprocess Communication and Signals
•
Traditional Unix IPC
– Pipes: directly channels data between related processes through an file like object
– Named Pipes – FIFO paipes actually implemented as files in the file system namespace
– Socket – can be over a network or local (domain)
•
System V IPC
– Shared Memory – process create segment of shared memory shared among each other
– Message Queue – each message contains a 32 bit type value and a data payload
– Semaphores – process can sleep on them, used for synchronization but any process can
increment
•
Solaris doors – Door server contains a thread that sleeps waiting for client, client
calls server through a door and scheduling control is passed to the door to the
requesting thread through the door server. Very low latency turnaround.
•
Signals – can interrupt a process after an event occurs. Signals can be ignored,
caught and handled, or treated with a default action.
Memory
•
64-bit kernel and process address space
• optimizes memory use by sharing program binaries and application data among
processes
• VM system manages most objects related to I/O and memory, kernel and user
applications, shared libraries and file systems
– Manages virtual-to-physical mapping of memory.
– Manages swapping memory between primary and secondary storage to optimize
performance.
– Handles requests of shared images between multiple users and processes.
– It acts as an integrated file cache.
• Newer features in the VM implementation include :
– During I/O uses 64 bit address space to create a permanent mapping of all physical
pages into SEGKPM, eliminating need to map/unmap for each I/O.
– Variable page sizes, largest available now is 356 Bytes
– Generic framework: Multiple Page Size Selection (MPSS) for various page sizes
– Support for nonuniform (NUMA) memory architectures
– Dynamic reconfiguration – new pages can add to the free list on the fly while the
kernel is in a safe “kernel cage”
– Modern memory allocators support slabs
Virtual Memory
•
•
•
Pages can very in size, common size is 8
Kbytes.
Solaris kernel uses a combined demandpaged and swapping model.
Abstract memory objects called segments,
vnodes, and pages
–
–
–
–
Physical memory, in chunks called pages
Virtual file object called vnode
File system is a hierarchy of vnodes
Process and Kernel address space as
segements of mapped vnodes
– Mapped hardware devices (ie frame buffers)
are segments of hardware-mapped pages
•
Physical Memory management done by
Hardware Address Translator (HAT)
– Machine independent implementation
Virtual Memory Continued
•
•
•
•
•
Process’s virtual address space skeleton created by kernel when the fork()
system call creates the process
Memory is allocated on the heap, malloc() doesn’t create physical memoy
Heap can be allocated in 32 or 64 bit mode, much larger with 64 bit mode.
Picture on the right show how memory mapping can share data among processes
Several options govern how a file is shared when it is mapped between process
– MAP_SHARED can be set to PROT_, READ|PROT_, WRITE
– MAP_PRIVATE can be set to PROT_, READ|PROT_, WRITE
•
Each segment has protection mode Read, Write, or Executable.
Page Faults and Anonymous Memory
•
•
•
Major Page fault occurs when
physical page does not exist
Minor page fault when page
is in physical memory but no
MMU translation is exists
(attaches)
Protection fault when access
violates memory permissions
There can also be anonymous memory, pages that are not associated with a vnode.
They are used for new heap space, and are allocated by a zero-fill-on-demand
operation, or a ZFOD.
Intimate Shared Memory
•
System V shared memory (ipc)
option
• Shared Memory optimization:
– Additionally share low-level
kernel data
– Reduce redundant mapping
info (V-to-P)
• Shared Memory is locked,
never paged
– No swap space is allocated
• Use SHM_SHARE_MMU
flag in shmat()
Physical Memory
•
•
•
•
•
•
•
Memory managed by page scanner
deamon (except kernel memory)
When the system is booted
memory is placed on the freelist in
page size chunks.
Anonymous memory is used for
most of a processes’s memory
allcoation (heap and stack).
Pages are read into memory from
the free list and then reside in a
segmap cache, process’s address
space, or the cachelist.
page_create_va() allocates pages,
taking into account the virual
address to calculate page coloring.
Page scanner uses global page
replacement.
Two bits are kept per page to
indicate if the page has been
modified since bits were last
cleared.
Page swapping “ two-handed clock algorithm”
•
In addition to this page-out process, the dispatcher can swap out entire processes
to conserve memory, it does this rarely but in extreme circumstances.
Slab and HAT
•
Solaris has a general purpose memory allocator known as the slab allocator.
Used for memory requests that are :
– Smaller then a page size
– Not even a multiple of a page size
– Frequently going to allocated and freed memory that causes fragmentation
•
Solves fragmentation issues by grouping different-sized memory objects into
separate caches, where each object cache has it’s own size and characteristics
•
The HAT layer programs’s the TLB with entries identifiying the relationship of
the virutal and physical addresses.
If the TLB lookup fails, as backup the UltraSPARC uses a translation storage
buffer (TSB), while most other architectures use a hardware page table.
Big difference cause the TSB is a software lookup, but Solaris provides both.
Take a look at the slide titled “Virtual Memory” to see a picture of the HAT
layer, it is on the right
•
•
•
Virtual File System VFS
•
•
•
•
Created to abstract away file systems so
NFS and UFS could co-exist
Made of vnode, the virtual node interface
that implements file-related functions, and
vfs the virtual file system that directs
functions to specific file systems
Structures consist of file descriptors in a file
list, which point to a per-process file table.
A vnode is looked up in this table, which
eventually points to a physical node
depending on file system implementation.
New in Solaris 10 : Zettabyte File System
– Endian Neutral – move files between
SPARC and x86 based systems
– ZFS protects all data with 64-bit checksums
– 128-bit file system!
– built on top of virtual storage pools
– All operations are transactional and copyon-write
Unix File System (UFS)
•
•
•
•
•
UFS we know and love : The default file
system for Solaris, in development for over 20
years.
Based around disk geometry : the number of
sectors in a track, the location of the head, and
the number of tracks.
Supports hard and soft links.
Inode (index node) is the internal descriptor for
a file
Access scheme : users, group, world.
I/O
•
Two distinct methods perform
file system I/O:
– read(), write(), and related
system calls
– Memory-mapping of a
file into the process's
address space
• Both are in the picture here
to the right.
Performance: NUMA systems
•
•
NonUniform Memory Access (NUMA)
machine - machines in which some memory
is closer to some CPUs than others
Addressed by the Memory Placement
Optimzation framework (MPO)
– Locality awareness
– Balancing
– Dynamic topology support
•
•
•
Latency groups (lgroup) – sets of CPU and
equidistant memory defined in the kernel.
A home lgroup is chosen for each thread
upon creation, and it prefers this lgroup.
For memory allocation, perfer lgroup but if
you know you have multithreaded, spread
out code, random placement may be better
CMT support and Parallel System architectures
•
•
Chip Multithreading (CMT) CPUs share various processor
components and caches
The three different parallel
architectures
– SMP. Symmetric multiprocessor
with a shared memory model;
single kernel image
– MPP. Message-based model;
multiple kernel images
– NUMA/ccNUMA. Shared
memory model; single kernel
image
So the Solaris kernel has several semaphores and mutex locks to help address concurrent
thread memory access. SMP (like Intel and AMD chips) and CMT (the UltraSPARC T1)
is lot more complicated then just NUMA system, and much research goes on in this field.
Sun’s attitude is to try to make things as simple as possible while still providing
necessary synchronization.
Networking : The TCP/IP Stack
•
•
Was two STREAMS layers with
packet queueing and locks between
layers and 1 processor thread per
connection
Now merges TCP and IP layers and
allocates a single thread per CPU.
– Streamlined to process packet
through both layers
– Binds connections to a CPU for
entire life
• Uses a vertical perimeter per-CPU
mehcnaism to protect the
connection. It is implemented
with an IP classifier, serialization
queue, and worker thread so only
one CPU processes a specific
packet.
• Integrated support for TCP offload
engines – let hardware do the work
Security
• For user permissions
– UFS and file system permissions
– Role Based Access Control since Solaris 8
– New in Solaris 10: least privilege model
– Access Control Lists let you make arbitrary security
permissions
• Kernel level permissions, the privileged kernel thread and
modules run the whole system and control Solaris
containers.
• Automated Patch Tool
• Solaris Cryptographic framework
• Full network traffic control, for example TCP packet
monitoring, disable redirecting of packets and answering
system pings.
Solaris Containers/Zones
•Containers provide the complete
virtualized environment, zones are
the component that provides the
isolation between zones.
•Up to 8192 virtualized
environments per Solaris OS
instance.
•Provides a secure sandbox that has
unique root, user and file systems.
Also network interfaces, devices,
hardware, I/O all virtualized.
•The kernel makes sure that the
zones are isolated.
•If a zone fails, it can reboot in a
few seconds.
Process rights management
•
Solaris 10 OS least privilege model includes nearly 50 fine-grained privileges as well as
the basic privilege set.
– Evolved from Trusted Solaris.
– Basic Privilege set includes al privileges given to unprivileged processes in the
tradition security model
• Each process has four sets in it’s kernel credentials
–
–
–
–
•
The Inheritable set (I): The privileges inherited on exec.
The Permitted set (P): The maximum set of privileges for the process.
The Effective set (E): The privileges currently in effect, a subset of P.
The Limit set (L): The upper bound of the privileges a process and its children may obtain
Once launched, a process uses privalege manipulation functions to add or remove
privaleges from the privilege sets
Cryptography
Two Basic Types
•User level Framework
•Exists Outside the Kernel
•Uses the PKCS 11 interface
•Applications use it
•Kernel Level Framework
•Operating System modules
use it
•Can interface with hardware
and software plug-ins
Niether provide actual encryption
algorithms, plug-ins do all the
work!
Both are verified by the Module
Verification Deamon
Cryptography Continued
•
Each plug in must be verified (signed) by the Module Verification Daemon
– First sets up thread pull that lives in the kCF to service requests
– Second answers request for verification of user and kernel level provider
signatures
User level crypto algorithms supported
•
•
•
•
•
Kernel level crypto algorithms supported
Cryptoadm() tool provided for administration of uCF and kCF.
/dev/crypto drivers allow communication between user and kernel level plug ins
/dev/cryptoadm runs the Module Verificaton Daemon
For user level, provides digest() and mac() for calculating digest and MAC of files.
Provides encrypt() and dectrypt() for encrypting and decrypting files
Solaris IPsec/IKE and Kerberos, user-level and kernel-level, have been ported to use
the Solaris Cryptographic Framework in the Solaris 10 OS.
DTrace Debugging System
•
•
•
•
•
Dynamically record data at points
of points of interest (probes) in the
user and kernel areas.
Record stack trace, timestamp,
arguments.
Kernel modules called providers
know how to activate probes
Has it’s own D language – a
compiler looks for probes and
providers, using the provider
information to find which probes
should be logged when fired.
DTrace won the top prize in the
Wall Street Journal's 2006
Technology Innovation Awards
competition
30,000 published probes
within the Solaris kernel
Recovery – Predictive self healing
•
•
•
Self diagnosing system is constantly
gathering data. Error reports are encoded
as a set of name-value pairs and form an
error event. Diagnosis engines run in the
background consuming error events.
Diagnosis engines output a fault event,
broadcast to all agents who can respond.
Enter the Solaris Fault Manager
–
–
–
–
•
•
Manages the diagnosis engines and agents
Provides a programming model for clients
Compiles logs
Manages multiplexing of events between
producers and consumers
Sun message identifier corresponds an
error message with an online
knowledgebase article or link
Diagnosis have a universal link identifier
so that solutions can be cross referenced
Why Solaris beats Linux
•
•
Solaris is more secure - it hasACLs, RBAC, PRM, and containers vs. ACLs and
Xen in Linux
Solaris is more Sable – Linux has rapid change and multiple centers of
control. While sun has a predictable lifecycle, and Solaris Application
Guarantee.
Solaris has a better price/performance : SPECjAppServer2002
results
Category
Hardware
Operating System
Dual Node
Sun Fire V20z
Red Hat Enterprise Linux AS Release 3
Multiple Node Application servers: Sun Fire V20z Solaris 9, x86 Platform edition
Database server: Sun Fire V40z
Price-performance Improvement over prior record holder
$101.10/TOPS
15%
$82.74/TOPS
40%
Solaris has a lower cost of support for high level support
Why Linux Beats Solaris
•
Novell points out Solaris’s higher cost for
multiple CPU machines
•
Novell points out Solaris’s poor performance
But Sun has put out a lot of technology to fight criticisms, like ZFS to address big
endian/little endian compatibility between SPARC and x86, and the linux binary API
to increase software options on Solaris.
Where Solaris is Headed
•
•
•
Since once the most popular UNIX based OS in the world, SUN has
lost a lot of market share.
– Microsoft Windows took the low-end market away from most
Unix systems
– Linux came in to pull away remainder
– Solaris left with the high-end space - based sales on its
stability, performance, and support
Now with Solaris 10 and OpenSolaris, sun is trying to regain the
low end market
Trying to work with AMD/Linux, not against it:
– Linux Application Environment
– Specific designs for AMD multiprocessor systems
– Free OS with competitive support options
• Trusted Solaris features in Solaris 10 a huge selling point
References
•
•
•
•
•
•
•
•
•
•
Solaris 10: In a Class By Itself
http://www.sun.com/software/whitepapers/solaris10/classbyitself.pdf
Solaris and Linux : SealRock research comparison whitepaper
http://www.sun.com/software/whitepapers/solaris10/sealrock.pdf
Solaris 10 The Complete Reference – http://books.mcgrawhill.com/downloads/products/0072229985/0072229985_ch01.pdf
Solaris 8 Administrator Certification Training Guide – Appendix C
http://unixed.com/Resources/history_of_solaris.pdf
Solaris™ Internals Core Kernel Components
http://www.phptr.com/content/images/0130224960/samplechapter/0130224960.pdf
Solaris™ Internals : Solaris 10 and OpenSolaris Kernel Architecture
http://www.sun.com/books/catalog/solaris_internals.xml
The Solaris Cryptographic Framework
http://www.sun.com/bigadmin/features/articles/crypt_framework.pdf
The least privilege model in the Solaris OS
http://www.sun.com/bigadmin/features/articles/least_privilege.html
Solaris and Linux Seal Rock Research Paper
http://www.novell.com/collateral/4621445/4621445.pdf
SUSE® Linux Enterprise Server 9 and Solaris 10 on x86
http://www.novell.com/collateral/4621445/4621445.pdf
Download