sector-sphere-2010-06-v4

advertisement

Sector & Sphere

An Introduction to

Sector/Sphere

Yunhong Gu

Univ. of Illinois at Chicago and VeryCloud LLC

@CHUG, June 22, 2010

What is Sector/Sphere?

 Sector: Distributed File System

 Sphere: Simplified Parallel Data

Processing Framework

 Goal: handling big data on commodity clusters

 Open source software, BSD license, written in C++.

 Started since 2006, current version 2.3

 http://sector.sf.net

Motivation: Data Locality

Super-computer model:

Expensive, data IO bottleneck

Sector/Sphere model:

Inexpensive, parallel data IO, data locality

Motivation: Simplified Programming

Parallel/Distributed Programming with MPI, etc.:

Flexible and powerful.

very complicated application development

Sector/Sphere model (cloud model):

Clusters regarded as a single entity to the developer, simplified programming interface.

Limited to certain data parallel applications.

Motivation: Global-scale System

U pl oa d

Data Center

Data Provider

US Location

Data Center

Downlo ad

Uplo ad

Data Reader

Asia Location

Data Center

Systems for single data centers:

Requires additional effort to locate and move data.

Processing

Data User

US Location

U pl oa d

Data Provider

US Location

Sector/Sphere

Dow nloa d

Data Reader

Asia Location

Sector/Sphere model:

Support wide-area data collection and distribution.

U plo ad

Data Provider

Europe Location

Data Provider

US Location

Sector Distributed File System

 DFS designed to work on commodity hardware

 racks of computers with internal hard disks and high speed network connections

 File system level fault tolerance via replication

 Support wide area networks

 Can be used for data collection and distribution

 Not POSIX-compatible yet

Sector Distributed File System

User account

Data protection

System Security

Security Server

SSL

Metadata

Scheduling

Service provider

Masters

SSL

System access tools

App. Programming

Interfaces

Clients

Data

UDT

Encryption optional slaves slaves

Storage and

Processing

Security Server

 User accounts, permission, IP access control lists

 Use independent accounts, but connect to existing account database via a simple

“driver”, e.g., Linux accounts, LDAP, etc.

 Single security server, system continue to run when security server is down, but new users cannot login

Master Servers

 Maintain file system metadata

 Metadata is a customizable module, currently there are two implementations, one in-memory and one on disk

 Authenticate users, slaves, and other masters

(via security server)

 Maintain and manage file replication, data IO and data processing requests

 Topology aware

 Multiple active masters can dynamically join and leave; load balancing between masters

Slave Nodes

 Store Sector files

 Sector file is not split into blocks

 One Sector file is stored on the “native” file system (e.g., EXT, XFS, etc.) of one or more slave nodes

 Process Sector data

 Data is processed on the same storage node, or nearest storage node possible

 Input and output are Sector files

Clients

 Sector file system client API

 Access Sector files in applications using the C++ API

 Sector system tools

 File system access tools

 FUSE

 Mount Sector file system as a local directory

 Sphere programming API

 Develop parallel data processing applications to process Sector data with a set of simple API

 The client communicate with slave directly for data IO, via UDT

UDT: UDP-based Data Transfer

 http://udt.sf.net

 Open source UDP based data transfer protocol

 With reliability control and congestion control

 Fast, firewall friendly, easy to use

 Already used in many commercial and research systems for large data transfer

Application-aware File System

 Files are not split into blocks

 Users are responsible to use proper sized files

 Directory and File Family

 Sector will keep related files together during upload and replication

 In-memory object

Sphere: Simplified Data Processing

 Data parallel applications

 Data is processed at where it resides, or on the nearest possible node (locality)

 Same user defined functions (UDF) are applied on all elements (records, blocks, files, or directories)

 Processing output can be written to Sector files or sent back to the client

 Transparent load balancing and fault tolerance

Sphere: Simplified Data Processing

for each file F in (SDSS datasets) for each image I in F findBrownDwarf(I, …);

Application

Sphere Client

Split data

Collect result n+m ...

n+3 n+2 n+1 n

Input Stream

SphereStream sdss; sdss.init("sdss files");

SphereProcess myproc;

Locate and Schedule

SPEs myproc->run(sdss," findBrownDwarf ", …);

SPE SPE SPE SPE n+3 n+2 n+1 n ...

n-k

Output Stream findBrownDwarf(char* image, int isize, char* result, int rsize);

Sphere: Data Movement

 Slave -> Slave Local

 Slave -> Slaves

(Hash/Buckets)

 Each output record is assigned an ID; all records with the same

ID are sent to the same

“bucket” file

 Slave -> Client n+m ...

Input Stream n+3 b ...

Intermediate

Stream

SPE SPE SPE SPE

3 n+2

2 n+1

1 n

0

SPE SPE SPE SPE b ...

Output Stream

3 2 1 0

What does a Sphere program like?

 A client application

 Specify input, output, and name of UDF

 Inputs and outputs are usually Sector directories or collection of files

 May have multiple round of computation if necessary (iterative/combinative processing)

 A UDF

 A C++ function following the Sphere specification (parameters and return value)

 Compiled into a dynamic library

Sphere/UDF vs. MapReduce

 Map = UDF

 MapReduce = 2x UDF

 First UDF generates bucket files and second processes the bucket files.

Sphere/UDF vs. MapReduce

 Sphere is more flexible and efficient

 UDF can be applied directly on records, blocks, files, and even directories

 Support multiple inputs/outputs with better data locality, including certain legacy applications that process files and directories

 Native binary data support w/ permanent index files

 Sorting is required by Reduce, but it is optional in

Sphere

 Output locality allows Sphere to combine multiple operations more efficiently

Sphere Benchmarks

Terasort: sort 1TB data over distributed servers

Malstone: detect malware website from billions of transactions

Graph processing: analyze very large social networks at billions of vertices (BFS and enumerating cliques)

Genome pipeline: analyze genome sequences

Satellite image processing: compare satellite images of different time, for disaster relief

 Sphere is about 2 – 4 times faster than Hadoop

Open Cloud Testbed

 15 Racks in Baltimore (JHU), Chicago

(StarLight and UIC), and San Diego

(Calit2)

 10Gb/s inter-site connection on

CiscoWave

 1 - 2Gb/s inter-rack connection

 Two dual-core AMD CPU, 8 - 16GB RAM,

1-4TB RAID-0 disk

Open Cloud Testbed

NLR SAND

2151

NLR LOSA

NLR VLAN 2151

CiscoWave

NLR CHIC

NLR VLAN 2560

CiscoWave

NLR WASH

San Diego

Calit2

CalIT2 RACK (32 nodes)

IP: 67.58.56.66-97/26

507

Chicago

StarLight

Chicago

UIC

Baltimore

JHU

StarLight RACK (32 nodes)

206.220.241.90-121/24

UIC RACK (32 nodes)

IP: 192.168.136.5-36/26

JHU RACK (32 nodes)

IP: 192.168.136.70-101/26

Development Status

 Current version 2.3, all core functions ready, still working on to improve code quality and details for certain modules.

 Partly funded by NSF for NCDM/UIC

 Commercial support via VeryCloud LLC

 Next step: support column-based data tables (similar to BigTable)

 Open source contributors are welcome

More Information

 Sector Website: http://sector.sourceforge.net

 Email: gu@lac.uic.edu

Download