talk slides

advertisement
Stream Processing of
X-ray Microdiffraction Data
on Multicores
Yuzhen Xie, University of Western Ontario (UWO)
joint work with
Alain Biem, IBM Research
Michael A. Bauer, UWO
Stewart McIntyre, UWO
Nobumichi Tamura, Lawrence Berkeley National Lab
AMMCS, July 2011
Motivation
• Efficiently use of multi-core processors to process large
blocks of synchrotron XRD data generated at high rates
(1 to 10 images per second of each 4MB)
• Develop high-performance kernels to achieve near realtime data analysis for synchrotron experiments, the goal
of the Active Network Interchange for Scientific
Experimentation (ANISE) project
Synchrotron X-ray White-beam
Microdiffraction
Dectris Pilatus 1M CCD at ALS (2010):
sub-second readout
CCD Camera
Diffracted beams
Sample
Incident X-ray
(5 – 30 KeV)
An image showing the Laue microdiffration
pattern of a unit-cell in a crystal sample
Process of Laue
Patterns for
Micro-texture
Analysis
Background fit and
removal (optional)
Example of Crystallographic Orientation and
Strain Maps (courtesy: Jing Chao and Marina Fuller, UWO)
Orientation map
Strain map, average strain: 9.92 x 10-3
Result by XMAS (X-ray Microdiffraction Analysis Software), Advanced Light Source
Reference Software Packages
• XMAS (X-ray Microdiffraction Analysis Software),
Advanced Light Source
• 3D X-ray Microdiffraction Analysis Software Package in IDL,
Advanced Photon Source
• A prototype of C code for a selection of features in Laue
pattern analysis, Science Studio and ANISE projects, UWO
Best sequential processing time: 25 to 50 seconds per image
Stream Processing Illustration
Continuous Ingestion
7
7
Continuous Analysis
IBM Streams Programming Model
Input
Process
Streams Processing Language (SPADE)
Platform optimized compilation
8
Output
XRD Image Stream
Laue XRD Processing System on Streams
Split
operator
Preprocessing
-Formatting
-Parsing
-XRD image data
Background
Removal
Filters available
-Parabolic
-2D Bruckner
-2D Mean Filter
Blob
Searching
-Blobs search
-Scheduling
for parallel
peak fitting
Bundle
Sorting
Peak Fitting
Indexing Strain
Functions Available
- Lorentz
- Gaussian
- Pearson VII
Processing Elements (mainly User-defined Operators (UDOPs))
Key Implementation Techniques
• Efficient Source operator for parsing image files:
block reading and type casting
• Fine-grained pipelining and cache-efficient
background filters
• Memory-efficient parallel peak fitting
• Organize common parameter values as a stream for
shared-use in indexing and strain analysis
A Fine-pipelined Background Filter based on
Parabolic Method
A Pilatus TIFF Image before and after Background Removal
Memory-efficient Parallel Peak Fitting
Data Management: the Key Issue
Blob center b: data set Rb (db x db) is needed for fitting a peak with center at p.
Peak center p: data set Rp is needed for integrated intensity computation.
Assume p is not far from b. Define R to be the square region (2db x 2db) with center
at b. Attach a data set R to a blob tuple rather than passing the whole image to each
peaking fitting element. Determine Rb and Rp by coordinate mapping in R.
Small data size, good locality, no memory contention, …, and hence efficiency.
A SPADE Code Snippet for Blob Searching and
Parallel Peak Fitting
## Parse an image
stream engStream(height: Integer, width: Integer, emax: …, evalues: DoubleList)
:= Source()[“file://c4-3_001.spe”,udfbinformat=“speParser”, blocksize=65536*15]{}
## Search blobs and generate blob stream
stream blobStream( groupid: Integer, blobid: Integer, …, lroi: DoubleList)
:= Udop(engStream)[“blobSearch”]{np=“NUM_PF”}
## Split blobs to subgroups
for_begin @c 0 to NUM_PF-1
stream subBlobStream@c(groupid: Integer, blobid: Integer, …, lroi: DoubleList)
for_end
:= Split(blobStream)[groupid]{}
## Parallel peak fitting for subgroups of bobs and bundle all peaks together
bundle peakBundle := ()
for_begin @c 0 to NUM_PF-1
stream subPeakFitStream@c(numblobs: Integer, x: Integer, …, inten: Double)
:= Udop(subBlobStream@c)[“peakFitting”]{}
peakBundle += subPeakFitStream@c
for_end
Organize Common Parameter Values as one Stream for Shareduse in Indexing and Strain Refinement of all XRD Images
Known crystal structure and
energy range (5-30 keV)
q
Beam direction kin,
Detector position and dimensions
List of peak positions
on the CCD
kout
Calculated qhkl list of
reflections
2q
Experimental qi list
of reflections
kin
q1
q2
a3
a2
a1
Find triplets a1, a2, a3 (thus q1,q2,q3) matching calculated
and measured values within a given angular tolerance
q3
Choose triplets indexing the largest number of reflections within
a given angular tolerance. Look for “missing” reflections.
Strain refinement
Streams Live Graph: One Pipeline with 4 Processing
Elements for Parallel Peak Fitting
Image
Sourcing
Blob Search
&
Scheduling
Parallel
Peak Fitting
Parameter
Sourcing
Indexing
Strain
2.5 seconds per image (2084*2084) on an Intel Core2 Quad CPU Q9550
(2.83 GHz, 8 GB RAM and 6 MB L2 cache)
Streams Live Graph: 4 Pipelines to Process 4 Images Concurrently
in Streaming Mode
Super-linear speedup obtained on an Intel Core2 Quad CPU Q9550
Conclusion
• We present the first stream processing application in the
field of synchrotron XRD data analysis.
• We show that stream processing is an effective model for
efficiently using multicore processors for XRD image data
analysis.
• Our system provides a high-performance processing kernel
to achieve near real-time data analysis of image data from
synchrotron experiments.
• Our work-in-progress include: evaluation, optimization,
configuration and deployment of this kernel to large
systems with many cores to process large set of XRD
images in parallel and streaming mode.
Thank You!
Download