MuLan_NCSA_MRAC_Proposal - TWiki

advertisement
Muon Lifetime Analysis
David W. Hertzog and David M. Webber
University of Illinois at Urbana-Champaign
For the MuLan collaboration1
March 7, 2016
ABSTRACT
Muon decay probes the weak force and provides the most precise
determination of the Fermi Constant—the “strength” of the weak interaction.
The Fermi Constant, GF, is one of the three key inputs to the Standard Model
of particle physics. The MuLan Collaboration at the Paul Scherrer Institute is
a UIUC-led experiment that aims to improve the determination of GF to sub
part-per-million precision, a 20-fold improvement compared to all previous
world efforts. An initial commissioning run of the experiment has been
completed and published, with the analysis based on local UIUC Nuclear
Physics Laboratory resources. Since then, a high-statistics run in 2006 has
been completed, obtaining more than 67 TB of raw data, which has been
uploaded into the NCSA Mass Storage System. We have developed and
tested efficient analysis codes using, in part, a Development Allocation Award
from NCSA. In this proposal, we request the computing time necessary to
process the large data set with the goal of extracting the physics result in a
timely manner. We describe the science, the methodology and the computing
requirements to achieve this goal.
1
Overview
In this document we provide the motivation and justification for a new Medium
Resource Allocation on the Tungsten cluster at NCSA to be used to carry out a highly
sophisticated physics analysis that will lead to the most precise determination of one of the
key parameters of the Standard Model of particle physics—the Fermi coupling constant. We
are requesting 100 kSUs, 25 TB of project disk space, and 100 TB on the mass storage
system. The allocation is sufficient to complete the analysis of data obtained by us in 2006,
a dataset representing the decays of more than 1012 muons, as explained below.
Defining the parameters of the Standard Model is a high priority of both particle and
nuclear physics. Important advances come from efforts to produce directly new particles at
colliders and, alternatively, from precision low-energy measurements, which test the
predictions of the theory or which measure the input parameters. The Muon Lifetime
Analysis (MuLan) experiment aims to measure the positive muon lifetime to 1-ppm
precision, thus determining the Fermi coupling constant, GF, to 0.5 ppm. The Fermi
constant represents the “strength” of the Weak Force, one of the four forces in nature. The
muon lifetime measurement provides, by far, the most precise method of establishing this
fundamental constant. A 1 ppm goal represents a 27-fold improvement compared to any
single previous experiment, the most recent of which had been carried out more than 20
years ago. The experiment follows logically from our highly successful muon “g–2”i
experience, another high-precision measurement that also relied on NCSA processing to
extract physics from raw data. The scientific case for MuLan is motivated as part of a world
effort to determine to the fullest extent possible the fundamental parameters of standard
Senior personnel: R. M. Carey, P. T. Debevec, K. Giovanetti, F. Gray, T. P. Gorringe, D. W.
Hertzog, P. Kammel, B. Lauss, K.R. Lynch, J. P. Miller, Q. Peng, B.L. Roberts, V. Tishchenko,
D. M. Webber, P. Winter; Collaborating Institutions: Boston, Illinois, Kentucky, Regis, Paul
Scherrer Institut.
1
1
Figure 1. Diagram of the MuLan detector with
several elements removed to show the AK-3 target.
Muons enter though a vacuum beampipe and stop in
the AK-3. Two example decay trajectories are
shown.
Figure 2. A subset of the Fall 2004 data illustrating the
rate of decays during the accumulation and measuring
periods. The 2004 data clearly shows the time structure
of the experiment. To reduce data volume in 2006, only
decays during the measurement period were recorded.
electroweak theory.
Very briefly, the experiment, which takes place at the Paul Scherrer Institute in
Switzerland, uses a high-intensity artificially chopped muon beam, a nearly hermetic decay
detector, a high-precision clock system, and a fast DAQ environment. After five years of
hardware development, a “small” data set was obtained in 2004, which was locally analyzed
at Illinois and the results were published in July 2007 in Physical Review Lettersii. In 2006,
the hardware was completed and the first large data set was obtained (1012 events, 67 TB
data recorded). Since then we have developed, using our local resources, the coding and
optimization for processing this data. A development allocation from NCSA has permitted
testing performance aspects of our code. It is now time to enter full production analysis to
extract the physics, thus the timing of this proposal.
MuLan is supported by our main NSF umbrella grant, NSF PHY 06-01067, and the
experiment hardware was largely funded by a special Major Research Instrumentation
grant, NSF 00-79735, “Collaborative Research: The MuLan Project—Development of
instrumentation for a new high-precision determination of the Fermi coupling constant”.
Hertzog received a Guggenheim Fellowship to help commission the experiment. At present,
we are operating analysis tests at NCSA with Development Allocation PHY060034N.
2
The MuLan Experiment
2.1
Motivation
The Fermi constant GF is best determined from the muon lifetime,   , where it is
often termed G. The relationship is
 
1


G2 m5
192 3
1    ;
the parameter  accounts for QED radiative corrections. Prior to our recent intermediate
result, the combined world average experimental knowledge of  from many experiments
had an uncertainty of 18 ppm. However, the uncertainty on GF was dominated by
uncalculated two-loop radiative corrections prior to publication of a series of papers by van
Ritbergen and Stuart,iii where the authors reduced the theoretical error to less than 0.3
ppm. They further detail how electroweak corrections to muon decay, which can be
2
separated in a natural way from those involving QED alone, are contained in the term r in
the relation
GF
2

g2
8M W2
1  r  ,
which defines the Fermi constant in terms of the Standard Model weak coupling constant g.
The work is summarized by giving an unambiguous prescription that relates  (at low q2),
the muon lifetime, and the Z mass to renormalized Standard Model parameters. As an
example, these parameters can be used to predictiv the top-quark mass. Improvements in
GF and MZ, (or related electroweak parameters) and in the measured mt , will be useful in
future Standard Model tests. A high precision measurement of   is also used in the
analysis of the physics by our other current muon experiment—MuCap. There, we measure
the negative muon lifetime in an ultra-pure high-pressure hydrogen gas Time Projection
Chamber to determine the muon capture rate (and from that the fundamental induced
pseudoscalar coupling, gP). The MuCap experiment used NCSA resources to complete
analysis of its first large data and published the results in the same July 2007 Physical
Review Letters issuev as MuLan’s intermediate result.
2.2
Detector
The MuLan experiment is simple. A continuous beam of low-energy muons is
directed to stop in a thin target during a “beam-on” accumulation period lasting
approximately 5 s (2.3 ). The muon beam is then “switched off” and decays are
recorded during a 22-s (10 ) measuring period by a surrounding detector. This cycle is
repeated until more than 1012 decays are recorded. As an example, a 10-MHz dc beam
delivers 50 muons to the target during the accumulation period. Of those, 20 remain at
the beginning of the measuring period. The total cycle lasts 27 s; thus the effective beam
rate exceeds 700 kHz, nearly 40 times greater than traditional “one-at-a-time” experiments.
Figure 2 shows this pattern from a small subset of the data accumulated in Fall 2004.
During the measuring interval, the decay positrons are recorded by a multisegmented, nearly hermetic spectrometer (Figure 1). The geometry features 170
independent scintillator tile pairs, with each element read out by a photomultiplier tube
(PMT) whose signal is sampled at 450 MHz by a dedicated waveform digitizer (WFD). The
time and energy deposited in each tile are derived from the signal shape. Decay time
histograms are constructed from coincident hits and are then fit to extract the lifetime.
The design of the experiment is driven by systematic error considerations. Primary
concerns are related to multi-particle pileup, muon spin precession, time-dependence of
detector gains or electronic thresholds, and backgrounds. Pileup is minimized by the
segmentation of the detector, the relatively low peak rate per element (3 kHz) and by the
double-hit resolution enabled by recording the pulse waveform in both tile elements for each
event. Uncontrolled precession of the stopped, polarized muon ensemble can cause a
change in acceptance of the detectors during the measuring interval, which occurs because
the emitted positron rate is correlated to the direction of the muon spin. Residual
polarization is minimized by using a high-internal-field metal alloy, Anakrome-3 (AK-3), that
dephases the spins during the accumulation period, which is long compared to the period of
rotation. Finally, the detector features front-back matched, symmetric segments, where the
sum of elements is nearly immune to a change in spin direction.
3
2.3
Data Acquisition
Each tile element of the segmented detector is read out independently by a
dedicated set of electronics, which makes the setup well suited for parallelization.
Geometrically opposite tile pairs are read out by each four-channel WFD. When the analog
signal from a PMT rises above a preconfigured threshold, the waveform digitizer for that
channel writes out 24 samples of the waveform at 2.2 ns intervals. If the waveform is still
above threshold after 24 samples, the waveform digitizer retriggers, writing out another 24
samples. The waveform digitizer also records an offset time since the beginning of the
current beam cycle, or “fill,” and a fill number. The set of hits in all detectors for a period of
5000 fills comprise a data structure called a “midas event,” and three events are present in
the readout sequence at a time. From newest to oldest, the first event consists of
waveform data being collected by the WFDs, the second is being read from the WFD FIFOs
into the frontend computer’s memory, and the third is being compressed using open-source
zlib with an MD5 checksum. The six compressed events from the frontend computers are
assembled by the backend computer and two copies are written to LTO3 tape. There are
approximately 430 midas events in each 2 GB data file. The typical overall data rate in seen
by the backend was 25 MB/s with a DAQ livetime of 90%.
2.4
Status
The MuLan experiment was proposed in 1999. The MuLan detector was completed in
2003, and a test run was carried out that same year. In 2004, a physics run was taken with
preliminary instrumentation. The 2004 dataset was analyzed locally at the Nuclear Physics
Lab and resulted in a thesis and a publication that reduced the uncertainty on the world
average of the muon lifetime by a factor of 1.9. In 2005, the full complement of electronics
was installed and another shakedown run was carried out. In the fall of 2006, the first full
physics production run was completed, during which over 1012 muon decays were recorded.
The transfer of this data to NCSA’s mass storage system was completed in February 2007.
Since then, David Webber has been working on a complete version of the production
analysis code, and that code is ready for final testing on a significant fraction of the data.
In the summer of 2007, another dataset of equivalent size to the 2006 dataset was
obtained. The muon stopping target was changed from AK-3, with its high internal field, to
quartz plus an external magnet. This represents a significant systematic test—the two
methods should agree on the result if the experiment is properly understood. The transfer
of part of the 2007 data into NCSA is underway. The analysis effort of the 2007 dataset is
being supervised by a University of Kentucky postdoc within our collaboration. The proposal
presented here is aimed at the resources we believe will be required to analyze the 2006
dataset. During this effort, we will learn more about efficient processing and plan to submit,
separately, a request or renewal for a fine-tuned allocation sufficient to complete analysis of
the 2007 data.
3
Data Analysis and Computing Methodology
The MuLan data consists of digitized waveforms from the detector photomultiplier
tubes (PMTs). A one ppm measurement of the muon lifetime requires over 10 12 positron
“hits” in our detector. Each “hit” triggers both an inner and an outer tile, giving a total of
56 bytes per hit. When backgrounds and systematic runs are included, the size of the 2006
dataset becomes 67.44 TB in approximately 50,000 data files. Unlike most particle physics
experiments where choice events are filtered from background, the majority of our data will
4
be included in the final result. The analysis will reduce the size of the raw data in stages
while processing it into meaningful physics results.
The original proposal, submitted in 1999, called for the online analysis of the data—
onboard processors were to identify “hits” and histogram them in local memory. However,
developing and testing such an unforgiving algorithm would first require a data set where
offline analysis of event by event data could be carried out repeatedly. In 2004, we
recorded all the individual information about detector hits—a dataset nearly 100 times
smaller than the final proposal goal. Our experience in analyzing these data required
replaying the tapes many times as we learned more about systematic issues and tested
data self-consistency. For example, subtle data-ordering and nonlinear timing issues were
discovered in the 2004 data, originating from the preliminary electronics that were used.
These issues could not have been discovered without replaying the full statistical power of
the data and performing detailed self-consistency checks. Other systematic issues must be
tested in the 2006 and 2007 datasets, using the full dataset to look for any subtle
inconsistencies. Based on advances in storage technology and capacity, we decided to store
the 2006 data for reprocessing rather than relying solely on the online analysis—a scientific
decision that burdens us with a very challenging analysis of that large dataset. Local
laboratory resources are a more than a factor of 10 too small to permit us to process these
data, let alone store it in a convenient manner. In 2001, we had a similar challenge with
the muon g-2 experimental data and, using NCSA systems, we were able to process the raw
data into manageable histogram data sets, which were then analyzed using our local
machines. The g-2 experience was, in fact, very positive. Based on these considerations,
we are requesting a medium resource allocation on Tungsten to perform the data analysis of
the 2006 MuLan dataset. The analysis is divided into the following stages, with combined
requirements for all stages given in the next section.
Stage 0: The total size of the 2006 dataset is 67.44 TB. The upload of this dataset into
NCSA’s mass storage system was completed in early February, 2007. Since then, David
Webber has written a robust and efficient code to process the data using a development
allocation that expires on January 30, 2008.
Stage 1: This stage is the most CPU intensive, and it also requires a high bandwidth
between scratch/project space and the compute nodes. The purpose of this stage is to fit
each of the waveforms in the raw data with pulse templates, producing fitted times and
amplitudes for each pulse in the detector. Our current best effort at code optimization
processes one 2 GB raw data file into a 1 GB ROOT2 tree file in one hour. If bandwidth is
not an issue, we can process our entire dataset with 100 CPUs running continuously for two
weeks. Realistically, bandwidth into and out of NCSA’s mass storage system will constrain
the speed of the analysis. We expect the data will be processed in 1-2 months. We expect
to perform this stage 1-2 times.
Stage 2: After all the raw data files have been processed into physically meaningful tree
files, the tree files can be reprocessed many times to construct histograms from which we
can fit the muon lifetime. Each histogram file is typically a few tens of megabytes and takes
5-10 minutes to produce from a 1 GB tree file, depending on histograms being constructed.
We expect to perform this stage several times per stage 1 analysis.
Stage 3: The histogrammed files can be summed together to create summaries of each
run condition. This summary of the dataset contains the most physically meaningful
information from which conclusions about the run can be drawn. These summary histogram
ROOT is an object-oriented data analysis framework for nuclear and particle physics.
http://root.cern.ch
2
5
files will be transferred from NCSA to NPL for local analysis. We will perform this stage once
per stage 2 analysis.
3.1




Project Requirements
Tape Data Storage (100 TB): 67.44 TB of raw data are already on NCSA’s mass
storage system. Although the current version of the ROOT tree output files are
frugal in the data recorded, we are investigating the possibility of using 16-bit floats
instead of 32-bit floats for the fitted times and amplitudes of each pulse. We
estimate that using 16-bit floats will have negligible impact on our precision, and a
dramatic improvement in the size of the output tree file. If no improvements are
made to the output file, we will need an additional 32 TB of space on the mass
storage system to store tree files after analysis cuts. If the tree files are regenerated
due to further improvements in the production code, the tree files on the mass
storage system will be replaced. Therefore, we will need about 100 TB in the mass
storage system.
Disk Data Storage (25 TB): Our data set is too large to store entirely on disk at
once, so it will be broken into groups for processing. Raw data files will be staged to
global scratch for processing and then deleted. Tree files will be stored
intermediately in project space and long-term in the mass storage system. Having
many tree files available on a project disk at once will reduce our usage of the mass
storage system and speed up stage 2 of our analysis. We request 25 TB of project
space, but we can function with less.
CPU Hours (100 kSU): One 2 GB data file can be processed into a 1 GB tree file in
one hour on a single Tungsten CPU. After the tree file has been generated, it takes
5-10 minutes to generate a set of histograms from the tree file. What we learn from
our first production pass over the whole dataset may influence us to do another
production pass. We request enough CPU time to do two production passes and
several histogramming passes. At 34 kSUs/production pass, with the histogramming
passes considered equivalent to another production pass, this comes to 100 kSUs.
System: Tungsten is the logical choice for our analysis due to its large local disk
and fast access to the local mass storage system. Tungsten has fewer cores per
node than Abe, making it a sensible choice for high-throughput computing.
3.2 Production Code
Overview
Figure 3: A double pulse (red) is fit with two template pulses
(blue).
6
The production code processes raw data
files containing digitized pulse
waveforms into an output file in ROOT
tree format containing fitted times and
amplitudes for individual pulses. After
the raw data passes checksum and
basic data-quality checks, individual
blocks of 24 ADC samples are combined
into “islands.” Each island typically
contains one pulse, but approximately
one in one thousand contain two or
more pulses. A pulse template is fit to
the island using a simple parabolic minimization routine to find the pulse time, and the
residuals are inspected for additional pulses. If additional pulses are found, a more
sophisticated routine fits multiple pulses to the island using a multidimensional minimization
over all pulse times. In both these cases, pulse amplitude, pedestal, and goodness of fit are
calculated exactly for given pulse times, reducing the space wherein the minimizer optimize
the fit from three parameters per pulse to one. If the minimization should fail, the
amplitude-weighted time is recorded for the island. The production scales as O(N), where N
is the number of data blocks in the input data file.
3.3
Production Code Optimization
As shown by the output of psrun below, our code’s performance is unchanged by
running one or two processes per node. Running twenty processes simultaneously on 10
nodes does not increase the processing time. Overall, our analysis code is trivially
parallelizable, scaling as 1/P where P is the number of CPUs. Our code will be bandwidthlimited, depending on the speed of the connection to the mass storage system and to local
network storage. Considerable effort has been spent finding an efficient and correct pulse
fitting algorithm. Optimizations include a fast parabolic minimization routine to fit one
pulse, the analytic computation of pulse amplitudes and pedestals to reduce the search
space of the fitter, and well-tuned thresholds for the intelligent pulse finding routines to
control the addition and removal of fitted pulse templates from a digitized island. The
production code has been designed to pass once through the data with minimal computation
and interruption in the data stream.
One of the most dangerous systematic errors in our experiment is the possibility to
mistake two pulses for one pulse, artificially decreasing our counting rate more at early
times than late times and thereby systematically increasing the apparent lifetime. Our
correction for this effect is more effective for smaller deadtimes, motivating us to perform a
detailed fit on each of the pulse waveforms. However, there is a tradeoff in terms of
processing time. The smaller our imposed deadtime window, the more CPU hours it takes
to process a run (Figures 4,5).
3.4
Figure 4: An example of the resolving power of the
double-pulse finder. A waveform is simulated with two
pulses. One pulse is always at time 12 and amplitude
130, and a second pulse is placed on the island with
varying time and amplitude. This plot shows the
number of pulses found on an island vs. the time and
height of the second pulse.
7
Batch Jobs
Since individual runs take a short time
to process, we found that we can most
efficiently process runs by leasing a
node from the batch queuing system for
H hours and processing 2*H runs on the
node during that time. Multiple nodes
are requested individually from the
batch queue. We are willing to work
with NCSA staff on an alternate job
management system if recommended,
but this method has worked well for our
collaborators in the MuCap experiment,
who have similar data processing needs.
Actual time (s) to process 1 event
3.5
Local Computing
40
The Nuclear Physics Lab (NPL) maintains a small
cluster consisting of several desktop machines of
various speed and ten analysis nodes with two
Xeon processors. The MuLan collaboration also
has two data hosts in the NPL cluster with 4.0 TB
available for MuLan analysis. The NPL cluster is
well equipped to analyze the histograms produced
at NCSA in stage 2 of our analysis, but is a factor
of 10 smaller than necessary to do a production
pass through the data.
30
20
10
0
5 4.5 4 3.5 3 2.5 2 1.5 1 0.5
Deadtime (clock ticks)
Figure 5: Seconds required to process a midas
event versus the imposed deadtime of the fit.
The knee beginning at imposed deadtime of 3
clock ticks motivates us to set the deadtime of
the code at 3 for the most efficient processing.
4
Project Team
Qualifications
David Hertzog has supervised many
successful high-precision measurements. Among them, the g-2 experiment performed its
production data analysis at NCSA, making the UIUC group’s analysis very competitive
compared a parallel analysis effort based at Brookhaven National Lab. The publications
from the high-profile g-2 experiment have received over 1300 citations and continue to
attract attention worldwide. More recently, the NPL-based analysis of the 2004 MuLan
dataset centered at Illinois has generated a publication and a thesis which reduced the
uncertainty on the muon lifetime by a factor of 1.9. Our MuCap experiment has also used
NCSA to process a large, high-precision dataset and the results from this experiment have
been recently published and led to an Illinois Ph.D. thesis.
Graduate student David Webber has been a part of the MuLan experiment since
2003, and a precision measurement of the muon lifetime based on the analysis of the 2006
dataset will be his doctoral thesis. He has a strong background in computational physics
and has developed the majority of the code that will be used in the present effort.
5
Appendix: Abridged Psrun Counting Output
5.1
One analysis process per node
PerfSuite Hardware Performance Summary Report
Index Description
Counter Value
============================================================================================
1 Branch instructions..............................................
975055082186
2 Conditional branch instructions mispredicted.....................
23218259029
3 Conditional branch instructions not taken........................
410537368347
4 Conditional branch instructions correctly predicted..............
952274759172
5 Conditional branch instructions taken............................
565197215206
6 Floating point operations........................................
987503920266
7 Data translation lookaside buffer misses.........................
2595201050
8 Instruction translation lookaside buffer misses..................
1137907984
9 Total translation lookaside buffer misses........................
3728105262
10 Total cycles.....................................................
11272605686664
11 Instructions completed...........................................
7340759462154
12 Vector/SIMD instructions.........................................
373680866067
13 Level 2 total cache accesses.....................................
173697376475
8
14
15
16
17
18
19
20
21
22
23
24
Level 2 total cache hits.........................................
Level 1 instruction cache accesses...............................
Level 1 instruction cache misses.................................
Level 2 cache misses.............................................
Cycles stalled on any resource...................................
Load instructions................................................
Load/store instructions completed................................
Store instructions...............................................
Level 3 total cache accesses.....................................
Level 3 total cache hits.........................................
Level 3 cache misses.............................................
171028884023
11703550995602
11672030885
2873671351
607573779562
2215355870555
3263299881190
1042734369922
3453700980
638303324
2826755800
Statistics
============================================================================================
Counting domain........................................................
user
Multiplexed............................................................
yes
Floating point operations per cycle....................................
0.088
Vector instructions per cycle..........................................
0.033
Floating point operations per graduated instruction....................
0.135
Vector instructions per graduated instruction..........................
0.051
Graduated instructions per cycle.......................................
0.651
Graduated instructions per level 1 instruction cache miss..............
628.919
Loads completed per translation lookaside buffer miss..................
594.231
Stores completed per translation lookaside buffer miss.................
279.696
Loads/stores completed per translation lookaside buffer miss...........
875.324
Cycles per translation lookaside buffer miss...........................
3023.682
% cycles stalled on any resource.......................................
5.390
Graduated loads and stores per cycle...................................
0.289
Graduated loads and stores per floating point operation................
3.305
Graduated loads and stores per floating point operation................
3.299
Mispredicted branches per correctly predicted branch...................
0.024
Level 1 cache miss ratio (instruction).................................
0.001
Bandwidth used to level 2 cache (MB/s).................................
52.037
Bandwidth used to level 3 cache (MB/s).................................
51.188
MFLOPS (cycles)........................................................
279.407
MFLOPS (wall clock)....................................................
270.834
MVOPS (cycles).........................................................
105.730
MVOPS (wall clock).....................................................
102.486
MIPS (cycles)..........................................................
2077.016
MIPS (wall clock)......................................................
2013.285
CPU time (seconds).....................................................
3534.282
Wall clock time (seconds)..............................................
3646.160
% CPU utilization......................................................
96.932
5.2
Two analysis processes per node
Only one analysis process is shown.
PerfSuite Hardware Performance Summary Report
Index Description
Counter Value
============================================================================================
1 Branch instructions..............................................
976687416998
2 Conditional branch instructions mispredicted.....................
23153183125
3 Conditional branch instructions not taken........................
410515726951
4 Conditional branch instructions correctly predicted..............
950462977201
5 Conditional branch instructions taken............................
562658107966
6 Floating point operations........................................
989184519355
7 Data translation lookaside buffer misses.........................
2624071047
8 Instruction translation lookaside buffer misses..................
1158605390
9 Total translation lookaside buffer misses........................
3762313779
10 Total cycles.....................................................
11301275033972
11 Instructions completed...........................................
7323821542718
12 Vector/SIMD instructions.........................................
371901173205
13 Level 2 total cache accesses.....................................
174291089478
14 Level 2 total cache hits.........................................
172160013158
15 Level 1 instruction cache accesses...............................
11712388095912
16 Level 1 instruction cache misses.................................
11727391534
17 Level 2 cache misses.............................................
2884305828
9
18
19
20
21
22
23
24
Cycles stalled on any resource...................................
Load instructions................................................
Load/store instructions completed................................
Store instructions...............................................
Level 3 total cache accesses.....................................
Level 3 total cache hits.........................................
Level 3 cache misses.............................................
610794679537
2207330448798
3245516499360
1038037912583
3418236659
670506547
2785143316
Statistics
============================================================================================
Counting domain........................................................
user
Multiplexed............................................................
yes
Floating point operations per cycle....................................
0.088
Vector instructions per cycle..........................................
0.033
Floating point operations per graduated instruction....................
0.135
Vector instructions per graduated instruction..........................
0.051
Graduated instructions per cycle.......................................
0.648
Graduated instructions per level 1 instruction cache miss..............
624.506
Loads completed per translation lookaside buffer miss..................
586.695
Stores completed per translation lookaside buffer miss.................
275.904
Loads/stores completed per translation lookaside buffer miss...........
862.638
Cycles per translation lookaside buffer miss...........................
3003.810
% cycles stalled on any resource.......................................
5.405
Graduated loads and stores per cycle...................................
0.287
Graduated loads and stores per floating point operation................
3.281
Graduated loads and stores per floating point operation................
3.281
Mispredicted branches per correctly predicted branch...................
0.024
Level 1 cache miss ratio (instruction).................................
0.001
Bandwidth used to level 2 cache (MB/s).................................
52.097
Bandwidth used to level 3 cache (MB/s).................................
50.306
MFLOPS (cycles)........................................................
279.171
MFLOPS (wall clock)....................................................
269.189
MVOPS (cycles).........................................................
104.959
MVOPS (wall clock).....................................................
101.206
MIPS (cycles)..........................................................
2066.951
MIPS (wall clock)......................................................
1993.048
CPU time (seconds).....................................................
3543.297
Wall clock time (seconds)..............................................
3674.685
% CPU utilization......................................................
96.425
G. W. Bennett et. al., “Final Report of the Muon E821 Anomalous Magnetic Moment Measurement at
BNL,” Phys. Rev. D73 072003 (2006).
ii
D. B. Chitwood et. al., “Improved Measurement of the Positive Muon Lifetime and Determination of
the Fermi Constant,” Phys. Rev. Lett. 99, 032001 (2007).
iii
T. van Ritbergen and R.G. Stuart, “On the Precise Determination of the Fermi Coupling Constant from
the Muon Lifetime,” Nucl.Phys. B564, 343-390 (2000).
iv
J. Alcaraz, D. Abbaneo, P. Antilogus, T. Behnke, D. Bloch, A. Blondel, D. G. Charlton, R. Clare, P.
Clarke, S. Dutta, M. Elsing, S. Ganguli, M. W. Grunewald, A. Gurtu, K. Hamacher, C. Hawkes, M.
Hildreth, R. W. L. Jones, W. Lohmann, T. Kawamoto, Y. Khokhlov, C. M Mariotti, M. Martinez, E.
Migliore, K. Monig, M. Morii, A. Nippe, A. Olshevsky, Ch. Paus, M. Pepe-Altarelli, B. Pietrzyk, G.
Quast, P. Renton, D. Reid, D. Schlatter, R. Sobie, R. Tenchini, F. Teubert, L. Tomalin, P. S. Wells,
G. Crawford, D. Falciai, B. Schumm, D. Su, and E. Torrence. The LEP Electroweak Working Group
and the SLD Heavy Flavor Group, “A combination of preliminary LEP and SLD electroweak
measurements and constraints on the standard model,” International Conference on High Energy
Physics, Warsaw, 1996, CERN Report No. LEPEWWG/96-02., SLD Physics Note 52, July 30, 1996.
http://citeseer.ist.psu.edu/338756.html
v
V.A. Andreev et al. “Measurement of the rate of muon capture in hydrogen gas and determination of the
proton's pseudoscalar coupling g(P),” Phys.Rev.Lett.99, 032002 (2007).
i
10
Download