Sector & Sphere Tutorial Yunhong Gu Univ. of Illinois at Chicago @Booz Allen Hamilton, Aug 6, 2009 Outline Installation Sector File System Sphere Programming Installation: System Requirement Linux (debian recommended, XFS recommended) gcc 3.4 or above openssl development library FUSE development library (optional) System Architecture security_node.key ./users slave_acl.conf master_acl.conf Security Server security_node.cert master_node.key master.conf, topology.conf slaves.list master_node.cert client.conf Masters SSL Clients SSL Data slaves slaves master_node.cert slave.conf ls ./codeblue2 Makefile client conf gmp master slave common doc lib security udt Configure Security Server For a testing system, you can use the default configurations Otherwise, update slave ACL, master ACL, and user accounts Access Control List (ACL) Format IP1 IP2 IP3/Mask Example: 10.0.0.1 192.168.0.0/24 Access Control List (ACL) User Account All accounts in ./conf/users One account per file Example: ./conf/users/test is the account configuration for account “test” User Account PASSWORD xxx READ_PERMISSION / WRITE_PERMISSION /test /angle EXEC_PERMISSION TRUE ACL 0.0.0.0/0 QUOTA 1000000 Start the Security Server ./sserver <port> Default port is 5000 Configure the Master Server ./conf/master.conf SECTOR_PORT 6000 SECURITY_SERVER ncdm161.lac.uic.edu:5000 REPLICA_NUM 2 DATA_DIRECTORY /home/u2/yunhong/work/data/ Configure the Slaves ./conf/slave.conf MASTER_ADDRESS ncdm161.lac.uic.edu:6000 DATA_DIRECTORY /raid/sector/data/ Start masters and slaves ./start_master ./start_slave ./start_all ./stop_all Password-free SSH ./conf/slaves.list ./conf/slaves.list gu@192.168.136.1 /home/gu/codeblue2/slave/ gu@192.168.136.2 /home/gu/codeblue2/slave/ gu@192.168.136.3 /home/gu/codeblue2/slave/ username@slave_ip BLANK/TAB slave_path NOT the slave data directory path! Sector will automatically restart an offline slave, if its address is on this list Configure the Client ./conf/client.conf Optional, but useful for client tools and examples MASTER_ADDRESS ncdm161.lac.uic.edu:6000 USERNAME test PASSWORD xxx CERTIFICATE /home/gu/codeblue2/conf/master_node.cert Check System Status $cd client $cd tools $./sysinfo Display system information: list of masters, slaves, available disk spaces, etc. ./master/sector.log Accessing Sector FS Tools: ./client/tools ls, mkdir, stat, rm, download, upload, cp, mv FUSE: ./client/fuse make mount: ./sector-fuse <local dir> unmount: fusermount -u <local dir> Programming with Sector #include <fsclient.h> Sector::init(master_ip, master_port); Sector::login(username, password, cert); Sector::logout(); Sector::close(); Programming with Sector Sector::list(path, vector<SNode>& attr) Sector::stat(path, SNode& attr) Sector::mkdir(path) Sector::move(src, dst) Sector::remove(path) Sector::copy(src, dst) Sector::utime(path, ts) SNode std::string m_strName; bool m_bIsDir; std::set<Address, AddrComp> m_sLocation; int64_t m_llTimeStamp; int64_t m_llSize; Sector Files SectorFile handle; handle.open(path, mode); handle.read(buf, size); handle.write(buf, size); handle.close(); seekp, seekg, tellp, tellg, upload, download Sphere Programming for each file F in (SDSS datasets) for each image I in F findBrownDwarf(I, …); Application Sphere Client Collect result Split data n+m ... SphereStream sdss; Locate and Schedule sdss.init("sdss files"); SPEs SphereProcess myproc; myproc->run(sdss,"findBrownDwarf", …); myproc->read(result); n+3 n+2 n+1 n Input Stream SPE SPE SPE SPE n+3 n+2 n+1 n ... n-k Output Stream findBrownDwarf(char* image, int isize, char* result, int rsize); Record Offset Index Data Text1 text1 text1 text1 Text2 text2 Text3 text3 text3 Index 0 23 44 61 Index is a binary file with 64-bit integers, with a postfix of “idx” user.dat / user.dat.idx Hashing and Bucket Files Similar to the Reduce process in MapReduce Each output record is assigned a bucket ID Records with the same bucket ID will be sent to the same bucket file User Defined Function (UDF) int _FUNCTION_(const SInput* input, SOutput* output, SFile* file) UDF::SInput struct SInput { char* m_pcUnit; int m_iRows; int64_t* m_pllIndex; char* m_pcParam; int m_iPSize; }; UDF::SOutput struct SOutput { char* m_pcResult; int m_iBufSize; int m_iResSize; int64_t* m_pllIndex; int m_iIndSize; int m_iRows; int* m_piBucketID; int64_t m_llOffset; string m_strError; }; UDF::SOutput If m_pcResult or m_pllIndex is not large enough, resize it When processing a file, if the result is too large, set m_llOffset to record the current file position and the UDF will be called again to restart processing from m_llOffset, until m_llOffset is set to -1. UDF::SFile struct SFile { std::string m_strHomeDir; std::string m_strLibDir; std::string m_strTempDir; std::set <std::string> m_sstrFiles; }; Results can be written into local files, the paths should be put into m_sstrFiles UDF __FUNCTION__.cpp #include <sphere.h> extern “C” { int _FUNCTION_(const SInput* input, SOutput* output, SFile* file) { } } generate FUNC.so file A Sphere Program #include <dcclient.h> Sector::init(); Sector::login(…) SphereStream input; SphereStream output; SphereProcess myProc; myProc.loadOperator(“func.so”); myProc.run(input, output, func, 0); myProc.read(result) myProc.close(); Sector::logout(); Sector::close(); Sphere Stream Input vector<string> files; files.insert(files.end(), "/html"); SphereStream s; s.init(files); Output SphereStream temp; temp.setOutputPath("/result", "bucket"); temp.init(256); Upload UDF and related files SphereProcess::loadOperator(path) Send UDF to all selected slaves for the current process Can also send any other files (applications, parameter data, etc.) The path will be stored in SFiles::m_strLibDir Run a Sphere Process int run(const SphereStream& input, SphereStream& output, const string& op, const int& rows, const char* param = NULL, const int& size = 0); rows: number of rows to pass to UDF each time N > 0: N rows 0: the whole segment -1: the whole file Read Result and Check Progress SphereProcess:read(SphereResult*& res, const bool& inorder = false, const bool& wait = true); If output.init(0), results will be sent back to the client int checkProgress(); Demo