New_tape_server_software

advertisement
DSS
Data & Storage Services
New tape server software
Status and plans
CASTOR face-to-face workshop
22-23 September 2014
Eric Cano
on behalf of CERN IT-DSS group
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
DSS
Overview
• Features for first release
• New tape server architecture
–
–
–
–
–
Internet
Services
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
•
•
•
•
•
Control and reporting flows
Memory management and data flow
Error handling
Main process and sessions
Stuck session and recovery
Development methodologies and QA
What changes in practice?
What is still missing?
Logical Block Protection investigation
Release plans and potential new features
New tape server software
Castor Workshop Sep 2014
2
DSS
Features for first release
•
Continuation of the push to replace legacy tape software
– Started with creation of tape gateway and bridge
– VMGR+VDQM will be next
•
Drop-in replacement
– Tapeserverd consolidated in a single daemon
– Replaces the previous stack:
•
•
Identical outside protocols (almost)
–
–
–
–
•
taped & satellites + rtcpd + tapebridged
Stager / Cli client (readtp in unchanged)
VMGR/VDQM
tpstat/tpconfig
New labelling command (castor-tape-label)
Keep what works:
– One process per session (pid listed in tpstat, as before)
Internet
Services
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
•
•
•
•
Better logs
Latency shadowing (no impact of slow DB)
Empty mount protection
Result from big teamwork since last meeting:
– E.Cano, S. Murray, V. Kotlyar, D. Kruse, D. Come
New tape server software
Castor Workshop Sep 2014
3
DSS
New tape server architecture
• Pipelined: based on FIFOs and threads/thread
pools
– Always fast to post to FIFO
• Push data blocks, reports, requests for more work
– Each FIFO output is served by one thread(pool)
• Simple loop: pop, use/serve the data/request, repeat
– All latencies are shadowed in the various threads
– Keep the instruction pipeline non-empty with task
prefetch
– N-way parallel disk access (as before)
– All reporting is asynchronous
Internet
Services
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
• Tape thread is the central element that we want
to keep busy at full speed
New tape server software
Castor Workshop Sep 2014
4
DSS
Migration session overview
Migration Mount Manager (main thread)*
Task Injector
Request for more
Disk Read Task
Tape Write Task
Read data
from disk
Pop,
execute,
delete
Free blocks
Get free
blocks
Pop,
execute,
delete
Push full
data block
n threads
Data blocks
www.cern.ch/it
1 thread
New tape server software
Pack information
For tapeserverd and
CERN IT Department
CH-1211 Genève 23
Switzerland
*(main thread)
Pop block, write
to tape, (flush,)
report result
1 thread
Report Packer
1 thread
Pack information
and send bulk report
on flush/end session
Global Status Reporter
Internet
Services
Request more
on threshold
Data FIFO
Tape Write Single Thread
Task queue
Task queue
Disk Read Thread Pool
1 thread
Return free block
1 thread
Get more work
from tape gateway,
create and push tasks
Client queue
Memory manager
Free blocks
Instantiate memory manager, injector, packer, disk and tape thread
Give initial kick to task injector
Wait for completion
1 thread
Castor Workshop Sep 2014
5
DSS
Recall session overview
Task Injector
Memory manager
1 thread
Disk Write Task
Pull free
blocks
Pop,
execute,
delete
Push full
data block
1 thread
Global Status Reporter
1 thread
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
New tape server software
n threads
Report Packer
1 thread
Pack information
and send bulk report
threshold/end session
*(main thread)
Pack information
For tapeserverd and
Internet
Services
Pop block, write
to disk,
report result
Data blocks
Individual file reports, flush
reports, end of session report
Read data
from tape
Data FIFO
Disk Write Thread Pool
Request more
on threshold
Pop,
execute,
delete
Tape Read Task
Task queue
Task queue
Tape Read Single Thread
Return free block
(no thread)
Get more work
from tape gateway,
create and push tasks
Free blocks
Instantiate memory manager, injector, packer, disk and tape thread
Give initial kick to task injector
Wait for completion
1 thread
Request for more
Recall Mount Manager (main thread)*
Castor Workshop Sep 2014
6
DSS
Control flow
•
Task injector
–
–
–
–
Initially called synchronously (empty mount detection)
Triggered by requests for more work (stored in a FIFO)
Gets more work from client
Creates and injects tasks
•
•
Tasks created, linked to each other (reader/writer couple) and injected to the tape
and disk thread FIFOs
Disk thread pool
– Pops disk tasks, executes them, deletes them and moves to the next
•
Tape thread
– Same as disk after initializing the session
•
•
•
•
•
Internet
Services
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Mounting
Tape identification
Positioning for writing
… and unmounting in the end
The reader thread(pool) requests for more work
–
–
–
–
Based on task FIFO content thresholds
Always ask for n files or m bytes (whichever comes first, configurable)
Asks again when half of that is still available in the task FIFO
Asks again one last time when the task FIFO becomes empty (last call)
New tape server software
Castor Workshop Sep 2014
7
DSS
Reporting flow
• Reports to client (file related)
– Posted to a FIFO
– Packed and transmitted in a separate thread
• Send on flush in migrations
• Send on thresholds in recalls
– End of session also follows this path
• Reports to parent process (tape/drive related)
Internet
Services
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
– Posted to a FIFO
– Transmitted asynchronously by a separate thread
– Parent process keeps track of the session’s status
and informs the VDQM and VMGR
New tape server software
Castor Workshop Sep 2014
8
DSS
Memory management and
data flow
• Same as before: circulate a fixed number of
memory blocks (size and count configurable)
• Errors can be piggy backed on data blocks
– Writer side always does the reporting, even for read errors
• Central memory manager
– Migration: actively pushes blocks for each tape write task
• Disk read tasks pulls block from there
• Returns the block with data in a second FIFO
• Data gets written to tape by the tape write task
– Recalls: passive container
• Tape read task pulls memory blocks as needed
• Pushes them to the disk write tasks (in FIFOs)
• Disk write tasks pushes the data to the disk server
Internet
Services
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
– Memory blocks get recycled to the memory manager after
writing to disk or tape
New tape server software
Castor Workshop Sep 2014
9
DSS
Error handling
• Reporting
– Errors get logged when they happen
– If error happens in the reader, it gets propagated
to the writer through the data path
– The writer propagates the error to the client
• Session behaviour on error
– Recalls: carry on for stager, halt on error for
readtp
• absolute positioning by blockId (stager)
• relative positioning by fSeq (readtp)
Internet
Services
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
– Migrations: any error ends the session
New tape server software
Castor Workshop Sep 2014
10
DSS
Main process and sessions
• The session is forked by the parent process
– Parent process keeps track of sessions and drive statuses in a drive
catalogue
– Answers VDQM requests
– Filters input requests based on drive state
– Manages the configuration files
• The child session reports tape related status to the parent
process
– mount, unmounts
– amount of data transferred for the watchdog
• The parent process informs the VMGR and VDQM on behalf
of the child session
– Client library completely rewritten
• Forking is actually done a utility sub-process (forker)
– No actual forking from the multithreaded parent process
• Process inventory:
Internet
Services
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
– 1 parent process + 1 fork helper process
– N session processes (at most 1 per drive)
New tape server software
Castor Workshop Sep 2014
11
DSS
ZeroMQ+Protocol buffers
• The parent/session processes communication
is a no-risk protocol
– Both ends get release/deployed together
– Can be changed at any time
• Opportunity to experiment new serialization
methodologies
– Need to replace umbrello
• This gave good results
– Protocol buffers provide robust serialization with little
development effort
– ZMQ handles many communication scenarios
Internet
Services
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
• Still in finalization (issues in the watchdog
communication)
New tape server software
Castor Workshop Sep 2014
12
DSS
Stuck sessions and recovery
• Stuck sessions do happen
– RFIO problems suspected
• Currently handled by a script
– Log file based. No move for set time => kill
– Problematic with unusually big files
• Watchdog will get more internal data
– Too much to be logged
– If data stops flowing for a given time => kill
• Clean-up process launched automatically when
session killed
• No clean-up after session failure
Internet
Services
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
– a non-stuck session failed to do its own clean-up
– => drive down
New tape server software
Castor Workshop Sep 2014
13
DSS
Development methodologies
and QA
•
Full C++, maintainable software
– Object encapsulation for separately manageable units
•
Easy unit testing
– Exception handling simplifies error reporting a lot
– RAII (destructors) simplifies resource management
– Cleaner drive specifics implementation through inheritance
•
•
Easy to add new models
Hardcoding-free SCSI and tape format layers
– Naming conventions matching the SCSI documentations
– String error reporting for all SCSI error
– Very similar approach with the AUL tape format
•
Unit testing
– Allows running various scenarios systematically
•
•
•
•
On RPM build
Migrations, recalls, good day, bad day, full tape
Using fake objects for drive, client interface
Easier debugging when problems can be reproduced in unit test context
– Run test standalone + through valgrind and helgrind
•
Internet
Services
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
•
•
Automatic detection of memory leaks and race conditions
Completely brought to the CASTOR tree
Automated system testing would be a nice addition to this setup
New tape server software
Castor Workshop Sep 2014
14
DSS
What changes in practice?
• The new logs
– Convergence with the rest of CASTOR logs
– Single line at completion of tape thread
• Summarises the session for tape log
– More detailed timings
• Will make it easier to pinpoint performance bottlenecks
– New log parsing required
• Should be greatly simplified as all relevant information
is on a single line
Internet
Services
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
• A single daemon
• Configuration not radically changed
New tape server software
Castor Workshop Sep 2014
15
DSS
What is still missing?
• Support for Oracle libraries
• The parent process’s watchdog for transfer
sessions
– Will move stuck transfers detection from operators scripts
to internal (with better precision)
• File transfer protocol switching
– Add local file support
• reliance on rfio removed
– Add Xroot support
• switched on by configuration
• instead of RFIO
• Diskserver >= 2.1.14-15 required (for stat call)
– Add Ceph support
• Disk path based switch, automatic
Internet
Services
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
• Fine tuning of logs for operations
• Document the latest developments
New tape server software
Castor Workshop Sep 2014
16
DSS
Internet
Services
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
Release and deployment
• Data transfers are being validated now on IBM
drives
• Oracle drives will follow with mount suport
• Some previously mentioned features missing
• Target date for a tapeserverd-only 2.1.15
CASTOR release: end of November
• Production deployment ~January
• Compatible with current 2.1.14 stagers
• 2.1.14-15 on disk server will be needed for
using Xroot
• 2.1.14 is the end of road for rtcpd/taped
New tape server software
Castor Workshop Sep 2014
17
DSS
Logical block protection
• Tests of the tape drive feature have been done
by F. Nikolaidis, J. Leduc and K. Ha
• Adds a 4 byte checksum to tape blocks
• Protects the data block during the transfer from
computer memory to tape drive
• 2 checksum algorithm in use today:
– Reed-Solomon
– CRC32-C
Internet
Services
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
• Reed-Solomon requires 2 threads to match
drive throughput
• CRC32-C can fit in a single thread
– CRC32-C is available on most recent drives
New tape server software
Castor Workshop Sep 2014
18
DSS
Next tape developments
• Tapeserverd
– Logical block protection integration
– Support for pre-emption of session
• VDQM/VMGR
– Merge of the two in a single tape resource manager
• Simplify interface
• Asymmetric drive support
• Improve scheduling (atomic tape-in-drive semantics for migrations)
– Today, the chosen tape might no have compatible drives available,
leading to migration delays
• Remove need for manual synchronization
• Consider pre-emptive scheduling
–
–
–
–
–
Internet
Services
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
max-out the system with background task (repack, verify)
Interrupt and make space for user sessions when they come
Allow over quota for users when free drives exist
Leading to 100% utilisation of the drives
Facilitates tape server upgrades
– Integrate the authentication part for tape (from Cupv)
New tape server software
Castor Workshop Sep 2014
19
DSS
Conclusion
• Tape server stack has been re-written and consolidated
– New features already provide improvements
• Empty mount protection for both read and write
• Full request and report latency shadowing
• Better timing monitoring is already in place
– Major clean-up will allow easier development and
maintenance
• More new features coming
– Xroot/Ceph support
– Logical block protection
– Session pre-emption
• End of the road for rtcpd/taped
– Will be dropped form 2.1.15 as soon as we are happy with
tapeserverd in production
• More tape software consolidation around the corner
Internet
Services
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
– VDQM/VMGR
New tape server software
Castor Workshop Sep 2014
20
Download