Tutorial Slides at Booz Allen Hamilton

advertisement
Sector & Sphere
Tutorial
Yunhong Gu
Univ. of Illinois at Chicago
@Booz Allen Hamilton, Aug 6, 2009
Outline
Installation
 Sector File System
 Sphere Programming

Installation: System Requirement
Linux (debian recommended, XFS
recommended)
 gcc 3.4 or above
 openssl development library
 FUSE development library (optional)

System Architecture
security_node.key
./users
slave_acl.conf
master_acl.conf
Security Server
security_node.cert
master_node.key
master.conf,
topology.conf
slaves.list
master_node.cert
client.conf
Masters
SSL
Clients
SSL
Data
slaves
slaves
master_node.cert
slave.conf
ls ./codeblue2











Makefile
client
conf
gmp
master
slave
common
doc
lib
security
udt
Configure Security Server
For a testing system, you can use the
default configurations
 Otherwise, update slave ACL, master ACL,
and user accounts

Access Control List (ACL)

Format
IP1
IP2
IP3/Mask

Example:
10.0.0.1
192.168.0.0/24
Access Control List (ACL)
User Account
All accounts in ./conf/users
 One account per file
 Example: ./conf/users/test is the account
configuration for account “test”

User Account
PASSWORD
xxx
READ_PERMISSION
/
WRITE_PERMISSION
/test
/angle
EXEC_PERMISSION
TRUE
ACL
0.0.0.0/0
QUOTA
1000000
Start the Security Server
./sserver <port>
 Default port is 5000

Configure the Master Server

./conf/master.conf
SECTOR_PORT
6000
SECURITY_SERVER
ncdm161.lac.uic.edu:5000
REPLICA_NUM
2
DATA_DIRECTORY
/home/u2/yunhong/work/data/
Configure the Slaves

./conf/slave.conf
MASTER_ADDRESS
ncdm161.lac.uic.edu:6000
DATA_DIRECTORY
/raid/sector/data/
Start masters and slaves
./start_master
 ./start_slave

./start_all
 ./stop_all
 Password-free SSH
 ./conf/slaves.list

./conf/slaves.list
gu@192.168.136.1 /home/gu/codeblue2/slave/
gu@192.168.136.2 /home/gu/codeblue2/slave/
gu@192.168.136.3 /home/gu/codeblue2/slave/
username@slave_ip BLANK/TAB slave_path


NOT the slave data directory path!
Sector will automatically restart an offline slave, if its
address is on this list
Configure the Client


./conf/client.conf
Optional, but useful for client tools and examples
MASTER_ADDRESS
ncdm161.lac.uic.edu:6000
USERNAME
test
PASSWORD
xxx
CERTIFICATE
/home/gu/codeblue2/conf/master_node.cert
Check System Status
$cd client
$cd tools
$./sysinfo
Display system information: list of masters,
slaves, available disk spaces, etc.
./master/sector.log
Accessing Sector FS

Tools: ./client/tools
 ls,

mkdir, stat, rm, download, upload, cp, mv
FUSE: ./client/fuse
 make
 mount:
./sector-fuse <local dir>
 unmount: fusermount -u <local dir>
Programming with Sector
#include <fsclient.h>
 Sector::init(master_ip, master_port);
 Sector::login(username, password, cert);
 Sector::logout();
 Sector::close();

Programming with Sector
Sector::list(path, vector<SNode>& attr)
 Sector::stat(path, SNode& attr)
 Sector::mkdir(path)
 Sector::move(src, dst)
 Sector::remove(path)
 Sector::copy(src, dst)
 Sector::utime(path, ts)

SNode
std::string m_strName;
 bool m_bIsDir;
 std::set<Address, AddrComp>
m_sLocation;
 int64_t m_llTimeStamp;
 int64_t m_llSize;

Sector Files
SectorFile handle;
 handle.open(path, mode);
 handle.read(buf, size);
 handle.write(buf, size);
 handle.close();


seekp, seekg, tellp, tellg, upload,
download
Sphere Programming
for each file F in (SDSS datasets)
for each image I in F
findBrownDwarf(I, …);
Application
Sphere Client
Collect result
Split data
n+m
...
SphereStream sdss;
Locate and Schedule
sdss.init("sdss files");
SPEs
SphereProcess myproc;
myproc->run(sdss,"findBrownDwarf", …);
myproc->read(result);
n+3
n+2
n+1
n
Input Stream
SPE
SPE
SPE
SPE
n+3
n+2
n+1
n
...
n-k
Output Stream
findBrownDwarf(char* image, int isize, char* result, int rsize);
Record Offset Index

Data
Text1 text1 text1 text1
Text2 text2
Text3 text3 text3

Index
0 23 44 61

Index is a binary file with 64-bit integers,
with a postfix of “idx”
 user.dat
/ user.dat.idx
Hashing and Bucket Files
Similar to the Reduce process in
MapReduce
 Each output record is assigned a bucket
ID
 Records with the same bucket ID will be
sent to the same bucket file

User Defined Function (UDF)

int _FUNCTION_(const SInput* input,
SOutput* output, SFile* file)
UDF::SInput
struct SInput
{
char* m_pcUnit;
int m_iRows;
int64_t* m_pllIndex;
char* m_pcParam;
int m_iPSize;
};
UDF::SOutput
struct SOutput
{
char* m_pcResult;
int m_iBufSize;
int m_iResSize;
int64_t* m_pllIndex;
int m_iIndSize;
int m_iRows;
int* m_piBucketID;
int64_t m_llOffset;
string m_strError;
};
UDF::SOutput
If m_pcResult or m_pllIndex is not large
enough, resize it
 When processing a file, if the result is too
large, set m_llOffset to record the current
file position and the UDF will be called
again to restart processing from
m_llOffset, until m_llOffset is set to -1.

UDF::SFile
struct SFile
{
std::string m_strHomeDir;
std::string m_strLibDir;
std::string m_strTempDir;
std::set <std::string> m_sstrFiles;
};
Results can be written into local files, the
paths should be put into m_sstrFiles
UDF
__FUNCTION__.cpp
#include <sphere.h>
extern “C”
{
int _FUNCTION_(const SInput* input, SOutput*
output, SFile* file)
{
}
}
 generate FUNC.so file

A Sphere Program
#include <dcclient.h>
Sector::init(); Sector::login(…)
SphereStream input;
SphereStream output;
SphereProcess myProc;
myProc.loadOperator(“func.so”);
myProc.run(input, output, func, 0);
myProc.read(result)
myProc.close();
Sector::logout(); Sector::close();
Sphere Stream

Input
vector<string> files;
files.insert(files.end(), "/html");
SphereStream s;
s.init(files);

Output
SphereStream temp;
temp.setOutputPath("/result", "bucket");
temp.init(256);
Upload UDF and related files
SphereProcess::loadOperator(path)
 Send UDF to all selected slaves for the
current process
 Can also send any other files
(applications, parameter data, etc.)
 The path will be stored in
SFiles::m_strLibDir

Run a Sphere Process
int run(const SphereStream& input,
SphereStream& output, const string& op,
const int& rows, const char* param =
NULL, const int& size = 0);
 rows: number of rows to pass to UDF each
time

N
> 0: N rows
 0: the whole segment
 -1: the whole file
Read Result and Check Progress
SphereProcess:read(SphereResult*& res,
const bool& inorder = false,
const bool& wait = true);
 If output.init(0), results will be sent back to
the client
 int checkProgress();

Demo
Download