GT4 GridFTP for Admins: The New GridFTP Server William (Bill) Allcock

advertisement
GT4 GridFTP for Admins:
The New GridFTP Server
William (Bill) Allcock
Argonne National Laboratory
04-05 April, 2005
Outline

Quick Class Survey

Basic Definitions

GridFTP Overview

Configuring GSI

Server Configuration

Running the Server as a user
2
Quick Class Survey

By show of hands, how many…



Know what GridFTP is?
Can describe the difference between a client and
a server (for GridFTP)?
Know the difference between a control channel
and a data channel?

Have used globus-url-copy before?

Know what a bandwidth delay product is?

install their own software on Linux? (duh)

For my info

have good tools for monitoring log files
3
Basic Definitions
Basic Definitions

Command – Response Protocol



GridFTP and FTP fall into this category
Client


A client can only send one command and
then must wait for a “Finished response”
before sending another
Sends commands and receives responses
Server


Receives commands and sends responses
Implies it is listening on a port somewhere
(control channel)
5
Basic Definitions

Control Channel



Communication link (TCP) over which
commands and responses flow
Low bandwidth; encrypted and integrity
protected by default
Data Channel


Communication link(s) over which the
actual data of interest flows
High Bandwidth; authenticated by default;
encryption and integrity protection optional
6
Basic Definitions

Network Endpoint



multi-homed hosts

multiple stripes on a single host (testing)
Parallelism


Something that is addressable over the network (i.e.
IP:Port). Generally a NIC
multiple TCP Streams between two network
endpoints
Striping

Multiple pairs of network endpoints participating in a
single logical transfer (i.e. only one control channel
connection)
7
Parallelism vs Striping
8
New Server Architecture

GridFTP (and normal FTP) use (at least) two
separate socket connections:




A control channel for carrying the commands
and responses
A Data Channel for actually moving the data
Control Channel and Data Channel can be
(optionally) completely separate processes.
A single Control Channel can have multiple
data channels behind it.


This is how a striped server works.
In the future we would like to have a load
balancing proxy server work with this.
9
New Server Architecture

Data Transport Process (Data Channel) is architecturally,
3 distinct pieces:




The protocol handler. This part talks to the network and
understands the data channel protocol
The Data Storage Interface (DSI). A well defined API that
may be re-implemented to access things other than POSIX
filesystems
ERET/ESTO processing. Ability to manipulate the data
prior to transmission.

currently handled via the DSI

In V4.2 we want to support XIO drivers as modules and chaining
Working with several groups to on custom DSIs

LANL / IBM for HPSS

UWis / Condor for NeST

SDSC for SRB
10
Deployment Scenario under
Consideration



All deployments are striped, i.e. separate
processed for control and data channel.
Control channel runs as a user who can only read
and execute executable, config, etc. It can write
delegated credentials.
Data channel is a root setuid process




Outside user never connects to it.
If anything other than a valid authentication occurs it
drops the connection
It can be locked down to only accept connections
from the control channel machine IP
First action after successful authentication is setuid
11
Possible Configurations
Typical Installation
Control
Data
Separate Processes
Control
Data
Striped Server
Control
Data
Striped Server (future)
Control
Data
12
Third Party Transfer
RFT Client
SOAP
Messages
Notifications
(Optional)
RFT Service
Control Channel
Process
(globus user)
Master
DSI
Protocol
Interpreter
Data
Channel
Data
Channel
IPC Link
IPC
Receiver
Protocol
Interpreter
Master
DSI
IPC Link
Slave
DSI
Data
Channel
Data
Channel
Slave
DSI
IPC
Receiver
Data Channel Processes
(Root Setuid)
13
Server Configuration
Server configuration


We will take this from the web
http://www-unix.globus.org/toolkit
/docs/development/4.0-rafts/data/gridftp/
GridFTP_Public_Interfaces.html#config
15
Configuration for Striping


You need –i if you are running from xinetd
In reality, there is one configuration that
makes something a front end (PI)

-r or remote_nodes



This causes the Master (or Remote) DSI to be loaded
It wont actually move things, it will just talk to the client
and make IPC calls
And there is one config that makes a back
end (DTP)

-dn or data_node


causes it to start listening for IPC connections.
need –p if running as daemon, otherwise port is set up
via xinetd
16
Configuring the logging



log_module accepts either stdio or syslog
-Z or log_transfers puts a one entry per
transfer logging all the run parameters (src,
dest, user, buffer size, streams, time, etc)
log_level you have to play with that one, I
always use all 

You most likely do NOT want to use all in
production. It logs the entire protocol
exchange and your logs will get very big, very
fast.
17
Lets look at some configurations


on port 2811 (standard port) is a “normal”
install. One process, root setuid.
on port 2812 we have a front end running
as user globus


note auth-level 0 (no lookup or setuid)
on port 2912 we have a backend
associated with the 2812 frontend. It is
root setuid and will only accept
connections from the local machine.
18
Running the Server as a User





This works fine and many people do this.
Typically, users build the server in their home
area and put it on an ephemeral port.
Authentication is done using the users proxy.
The gridmap-file defaults to
~/.globus/gridmap-file if not running as root.
Note that in most cases only the user can run
transfers with this server, unless he maps
other peoples DN’s to his account, which is a
NO-NO at many sites.
19
Download