Gozal Technion's Condor enhancements High availability, Resource Security and Manageability

advertisement
Gozal
Technion's Condor enhancements
High availability, Resource Security and
Manageability
Mark Silberstein, Gabriel Kliot, Assaf Schuster
Technion – Israel Institute of Technology
October 2004
©Mark Silberstein, Gabriel Kliot, Technion
1
NeSC Condor week 2004, Edinburgh
Introduction
●
●
●
Gozal (in Hebrew) - nestling
Started in 2002 as Condor deployment
support project in the Distributed
Systems Lab , headed by Prof Assaf
Schuster
Today: projects for undergraduates
and graduate students, staff members,
growing Condor pool
©Mark Silberstein, Gabriel Kliot, Technion
2
NeSC Condor week 2004, Edinburgh
Our Goals
●
●
●
Adding functionality to Condor
Using Condor for teaching real
distributed systems
Using Condor for research
©Mark Silberstein, Gabriel Kliot, Technion
3
NeSC Condor week 2004, Edinburgh
Projects
●
●
●
●
Highly Available Condor Matchmaker
Resource protection - “Resource Body
Guard”
Centralized configuration and management
framework
Completed
●
●
Centralized policy enforcement
Efficient invocation of short data intensive jobs
©Mark Silberstein, Gabriel Kliot, Technion
4
NeSC Condor week 2004, Edinburgh
Highly Available
Matchmaker
●
Matchmaker is a single-point-of-failure
–
–
●
Negotiator’s failure - no additional matches
will be possible
Collector’s failure – negotiator is out of job,
tools querying collector won’t work, etc.
Our goal
–
Negligible throughput degradation in
case of failure
©Mark Silberstein, Gabriel Kliot, Technion
5
NeSC Condor week 2004, Edinburgh
Highly Available
Matchmaker
●
Our solution - Highly Available
Matchmaker
–
–
–
–
Automatic failure detection
Transparent failover to backup matchmaker
(no global configuration change for the pool
entities)
State replication between primary and
backups
“Split brain” reconciliation after network
partitions
©Mark Silberstein, Gabriel Kliot, Technion
6
NeSC Condor week 2004, Edinburgh
How it works – basic
scenario
Negotiator
I’m alive
I’m alive
HAD
Leader
HAD
Collector
Collector
Collector
Backup
Server
Active
Server
Backup
Server
Workstation – Startd and Schedd
©Mark Silberstein, Gabriel Kliot, Technion
Workstation – Startd and Schedd
7
NeSC Condor week 2004, Edinburgh
Status and roadmap
●
First prototype - April 2004
●
Finalizing implementation and testing
●
●
Planned to be released as a part of the
next 6.7.X release
More information:
http://dsl.cs.technion.ac.il/projects/gozal/project_pages/ha/ha.html
©Mark Silberstein, Gabriel Kliot, Technion
8
NeSC Condor week 2004, Edinburgh
Resource protection “Resource Body Guard”
●
●
The problem we want to solve
–
Malicious job can render a resource
unusable or obtain private information
–
Condor can be exploited for distributed
attacks on other systems
Solution – resource protection sandbox
–
Auditing of file system and network
access, OS resources
©Mark Silberstein, Gabriel Kliot, Technion
9
NeSC Condor week 2004, Edinburgh
MS Windows FS RBG
design overview
●
●
●
●
●
Implemented as standard FS driver
Intercepts all low level accesses to the hard
drive [very low overhead]
Normally lets all accesses through
When Condor invokes a job – reliably loads
“firewall” and denies unauthorized accesses
to FS performed by the job
When Condor job completes – reliably
unloads the “firewall”
©Mark Silberstein, Gabriel Kliot, Technion
10
NeSC Condor week 2004, Edinburgh
Status and road map
●
●
●
Today
–
FS driver coding done
–
integration with Condor and extensive testing
pending
Expected release of beta – Spring, 2005
(check our website)
Future directions
–
Maximum throughput restrictions and quotas
–
Network RBG
©Mark Silberstein, Gabriel Kliot, Technion
11
NeSC Condor week 2004, Edinburgh
Centralized
configuration framework
●
●
“The flipside of Condor flexibility is
complexity”
Lack of centralized pool configuration
Editing multiple copies of configuration
files is error-prone
System administrator should learn Condor
internals to configure Condor pool
●
●
●
Owners require simple set-and-forget
©Mark Silberstein, Gabriel Kliot, Technion
12
NeSC Condor week 2004, Edinburgh
Centralized configuration
framework overview
●
Centralized DB for pool configurations
–
●
●
Resources pull the updates
Unified GUI for pool management
Configuration elements abstraction (define once,
use many)
–
Functional change to Condor configuration, i.e.
“Allow_User_On_Resource(UserName)”
●
Resource configuration groups
–
Resource can be a member of one or more groups
–
Resource inherits configuration from the groups of
which it is a member
©Mark Silberstein, Gabriel Kliot, Technion
13
NeSC Condor week 2004, Edinburgh
©Mark Silberstein, Gabriel Kliot, Technion
14
NeSC Condor week 2004, Edinburgh
©Mark Silberstein, Gabriel Kliot, Technion
15
NeSC Condor week 2004, Edinburgh
©Mark Silberstein, Gabriel Kliot, Technion
16
NeSC Condor week 2004, Edinburgh
©Mark Silberstein, Gabriel Kliot, Technion
17
NeSC Condor week 2004, Edinburgh
©Mark Silberstein, Gabriel Kliot, Technion
18
NeSC Condor week 2004, Edinburgh
©Mark Silberstein, Gabriel Kliot, Technion
19
NeSC Condor week 2004, Edinburgh
Status and road map
●
Today
–
●
Hierarchical groups management and GUI
coding done
Expected beta release
–
Winter 2005
©Mark Silberstein, Gabriel Kliot, Technion
20
NeSC Condor week 2004, Edinburgh
Collaboration with Condor
team
●
Generous support from the Condor
team
–
We visited UW
–
Prof. Livni visited Technion
–
Source code made available, access to
CVS granted
–
Access to UW pool granted
–
Very open and friendly atmosphere
©Mark Silberstein, Gabriel Kliot, Technion
21
NeSC Condor week 2004, Edinburgh
Summary
●
●
●
●
We are developing addons for Condor
We work in tight collaboration with the
Condor team
Some of the projects already have the
deliverables available in beta
The work is in progress by staff
members to make releases stable
©Mark Silberstein, Gabriel Kliot, Technion
22
NeSC Condor week 2004, Edinburgh
Do you want to try it?
●
More information at
http://dsl.cs.technion.ac.il/projects/gozal
●
This work would not be possible
without
–
Prof Schuster, Eran Issler, Noam Palatin and all the
undergraduate students who worked on these projects
and of course the Condor
team and Prof Miron Livni
●
Contact: marks@tx.technion.ac.il
THANK YOU
©Mark Silberstein, Gabriel Kliot, Technion
23
NeSC Condor week 2004, Edinburgh
Download