hanoi_root

advertisement
Introduction to
the ROOT framework
http://root.cern.ch
Do-Son school on Advanced Computing
and GRID Technologies for Research
Institute of Information Technology, VAST, Hanoi
René Brun
CERN
What's ROOT?
René Brun
2
ROOT Application Domains
Data Analysis & Visualization
Data Storage: Local, Network
René Brun
3
ROOT in a Nutshell
 The ROOT system is an Object Oriented framework
for large scale data handling applications. It is
written in C++.
 Provides, among others,
 an efficient data storage and access system designed to support
structured data sets (PetaBytes)
 a query system to extract data from these data sets
 a C++ interpreter
 advanced statistical analysis algorithms (multi dimensional
histogramming, fitting, minimization and cluster finding)
 scientific visualization tools with 2D and 3D graphics
 an advanced Graphical User Interface
 The user interacts with ROOT via a graphical user interface,
the command line or scripts
 The command and scripting language is C++, thanks to the
embedded CINT C++ interpreter, and large scripts can be
compiled and dynamically loaded.
 A Python shell is also provided.
René Brun
4
ROOT: An Open Source Project
The project was started in 1995.
The project is developed as a collaboration
between:
 Full time developers:
• 11 people full time at CERN (PH/SFT)
• +4 developers at Fermilab/USA, Protvino , JINR/Dubna (Russia)
 Large number of part-time contributors (155 in CREDITS file)
 A long list of users giving feedback, comments, bug
fixes and many small contributions
• 2400 registered to RootForum
• 10,000 posts per year
An Open Source Project, source available
under the LGPL license
René Brun
5
More information...
http://root.cern.ch
Download
Documentation
Tutorials
Online Help
Mailing list
Forum
René Brun
6
Install ROOT
 Download & extract ROOT v5.16/00 from
http://root.cern.ch/root/Version516.html
 Set of precompiled versions and source code
 Set environment




export ROOTSYS=<path-to-installation>
export PATH=$ROOTSYS/bin:$PATH
export LD_LIBRARY_PATH=$ROOTSYS/lib:$LD_LIBRARY_PATH
export
DYLD_LIBRARY_PATH=$ROOTSYS/lib:$DYLD_LIBRARY_PATH
(MacOS X only)
 Compile source code (if chosen)




René Brun
cd $ROOTSYS
./configure
make
More information at http://root.cern.ch/root/Install.html
7
ROOT Library Structure
ROOT libraries are a layered structure
 CORE classes always required: support for RTTI, basic
I/O and interpreter
 Optional libraries loaded when needed. Separation
between data objects and the high level classes acting
on these objects. Example: a batch job uses only the
histogram library, no need to link histogram painter
library
 Shared libraries reduce the application size and link
time
 Mix and match, build your own application
 Reduce dependencies via plug-in mechanism
René Brun
8
Three User Interfaces
{
// example
// macro
...
}
René Brun
 GUI
windows, buttons,
menus
 Command line
CINT (C++ interpreter),
Python, Ruby,…
 Macros, applications,
libraries (C++ compiler
and interpreter)
9
CINT Interpreter
CINT in ROOT
CINT is used in ROOT:
 As command line interpreter
 As script interpreter
 To generate class dictionaries
 To generate function/method calling stubs
 Signals/Slots with the GUI
The command line, script and programming
language become the same
Large scripts can be compiled for optimal
performance
René Brun
11
Compiled versus Interpreted

Why compile?
Faster execution, CINT has limitations…

Why interpret?
Faster Edit → Run → Check result → Edit cycles
("rapid prototyping"). Scripting is sometimes just easier.

Are Makefiles dead?
No! if you build/compile a very large application
Yes! ACLiC is even platform independent!
René Brun
12
Running Code
To run function mycode() in file mycode.C:
root [0] .x mycode.C
Equivalent: load file and run function:
root [1] .L mycode.C
root [2] mycode()
All of CINT's commands (help):
root [3] .h
René Brun
13
Running Code
Macro: file that is interpreted by CINT (.x)
int mymacro(int value)
{
int ret = 42;
ret += value;
return ret;
}
Execute with .x mymacro.C(42)
René Brun
14
Unnamed Macros
No functions, just statements
{
float ret = 0.42;
return sin(ret);
}
Execute with .x mymacro.C
No functions thus no arguments
Recommend named macros!
Compiler prefers it, too…
René Brun
15
Named Vs. Unnamed
Named: function scope
mymacro.C:
void mymacro()
{ int val = 42; }
unnamed: global
unnamed.C:
{ int val = 42; }
Back at the prompt:
root [] val
Error: Symbol val
is not defined
René Brun
root [] val
(int)42
16
Survivors: Heap Objects
obj local to function, inaccessible from outside:
void mymacro()
{ MyClass obj; }
Instead: create on heap, pass to outside:
MyClass* mymacro()
{ MyClass* pObj = new MyClass();
return pObj; }
pObj gone – but MyClass still where pObj pointed
to! Returned by mymacro()
René Brun
17
Running Code – Libraries
"Library": compiled code, shared library
CINT can call its functions!
Build a library: ACLiC! "+" instead of Makefile
(Automatic Compiler of Libraries for CINT)
CINT knows all its functions / types:
.x something.C(42)+
something(42)
René Brun
18
My first session
root
root [0] 344+76.8
(const double)4.20800000000000010e+002
root [1] float x=89.7;
root [2] float y=567.8;
root [3] x+sqrt(y)
(double)1.13528550991510710e+002
root [4] float z = x+2*sqrt(y/6);
root [5] z
(float)1.09155929565429690e+002
root [6] .q
root
See file $HOME/.root_hist
root [0] try up and down arrows
René Brun
19
My second session
root
root [0] .x session2.C
for N=100000, sum= 45908.6
root [1] sum
(double)4.59085828512453370e+004
Root [2] r.Rndm()
(Double_t)8.29029321670533560e-001
root [3] .q
session2.C
{
int N = 100000;
TRandom r;
double sum = 0;
for (int i=0;i<N;i++) {
sum += sin(r.Rndm());
}
printf("for N=%d, sum= %g\n",N,sum);
unnamed macro
executes in global scope
}
René Brun
20
My third session
root
root [0] .x session3.C
for N=100000, sum= 45908.6
root [1] sum
Error: Symbol sum is not defined in current scope
*** Interpreter error recovered ***
Root [2] .x session3.C(1000)
for N=1000, sum= 460.311
root [3] .q
session3.C
Named macro
Normal C++ scope rules
René Brun
void session3 (int N=100000) {
TRandom r;
double sum = 0;
for (int i=0;i<N;i++) {
sum += sin(r.Rndm());
}
printf("for N=%d, sum= %g\n",N,sum);
}
21
My third session with ACLIC
root [0] gROOT->Time();
root [1] .x session4.C(10000000)
for N=10000000, sum= 4.59765e+006
Real time 0:00:06, CP time 6.890
root [2] .x session4.C+(10000000)
for N=10000000, sum= 4.59765e+006
Real time 0:00:09, CP time 1.062
root [3] session4(10000000)
for N=10000000, sum= 4.59765e+006
Real time 0:00:01, CP time 1.052
root [4] .q
File session4.C
Automatically compiled
and linked by the
native compiler.
Must be C++ compliant
René Brun
session4.C
#include “TRandom.h”
void session4 (int N) {
TRandom r;
double sum = 0;
for (int i=0;i<N;i++) {
sum += sin(r.Rndm());
}
printf("for N=%d, sum= %g\n",N,sum);
}
22
Macros with more than one function
root [0] .x session5.C >session5.log
root [1] .q
root [0] .L session5.C
root [1] session5(100); >session5.log
root [2] session5b(3)
sum(0) = 0
sum(1) = 1
sum(2) = 3
root [3] .q
.x session5.C
executes the function
session5 in session5.C
use gROOT->ProcessLine
to execute a macro from a
macro or from compiled
code
René Brun
session5.C
void session5(int N=100) {
session5a(N);
session5b(N);
gROOT->ProcessLine(“.x session4.C+(1000)”);
}
void session5a(int N) {
for (int i=0;i<N;i++) {
printf("sqrt(%d) = %g\n",i,sqrt(i));
}
}
void session5b(int N) {
double sum = 0;
for (int i=0;i<N;i++) {
sum += i;
printf("sum(%d) = %g\n",i,sum);
}
}
23
Running a script in batch mode
root –b –q myscript.C
 myscript.C is executed by the interpreter
root –b –q myscript.C(42)
 same as above, but giving an argument
root –b –q myscript.C+
 myscript.C is compiled via ACLIC with the native
compiler and linked with the executable.
root –b –q myscript.C++g(42)
 same as above. Script compiled in debug mode and
an argument given.
René Brun
24
Calling interpreter in a macro
All CINT commands can be excuted from
an interpreted macro or from compiled
code.
gROOT->ProcessLine(“.x myscript.C”);
René Brun
25
ROOT Tutorials & Doc
See Users Guide (pdf) at
See online Reference Manual at
Execute tutorials at $ROOTSYS/tutorials
293 tutorials in the following directories
foam
gui
matrix
net
sql
geom
fft
graphics io
quadp ruby
unuran
René Brun
spectrum xml
physics
gl
pythia
tree
hist
mlp
splot
image pyroot thread
fit
graphs math
26
Generating a Dictionary
MyClass.h
MyClass.cxx
rootcint –f MyDict.cxx –c MyClass.h
MyDict.h
MyDict.cxx
compile and link
libMyClass.so
René Brun
27
HTML
René Brun
28
28
Math
TMVA
SPlot
René Brun
29
Graphics & GUI
TPad: main graphics container
Root > TLine line(.1,.9,.6,.6)
Root > line.Draw()
Root > TText text(.5,.2,”Hello”)
Root > text.Draw()
Hello
The Draw function adds the object to the
list of primitives of the current pad.
If no pad exists, a pad is automatically
created with a default range [0,1].
When the pad needs to be drawn or
redrawn, the object Paint function is called.
René Brun
Only objects deriving
from TObject may be drawn
in a pad
Root Objects or User objects
31
Basic Primitives
TLine
TArrow
TEllipse
TButton
TBox
TDiamond
TCurvyLine
TText
TPave
TMarker
TPavesText
TCrown
TPaveLabel
TCurlyArc
TPolyLine
TLatex
René Brun
32
Full LateX
support
on screen
and
postscript
latex3.C
Formula or
diagrams
can be
edited with
the mouse
Feynman.C
TCurlyArc
TCurlyLine
TWavyLine
and other building
blocks for
Feynmann diagrams
René Brun
33
Graphs
TGraphErrors(n,x,y,ex,ey)
TGraph(n,x,y)
gerrors2.C
TCutG(n,x,y)
TMultiGraph
TGraphAsymmErrors(n,x,y,exl,exh,eyl,eyh)
René Brun
34
Graphics examples
René Brun
35
René Brun
36
René Brun
37
René Brun
38
Graphics (2D-3D)
TH3
TGLParametric
“LEGO”
René Brun
“SURF”
TF3
39
ASImage: Image processor
René Brun
40
GUI (Graphical User Interface)
René Brun
41
Canvas tool bar/menus/help
René Brun
42
Object editor
René Brun
Click on any
object to show
its editor
43
TBrowser: a dynamic browser
René Brun
see $ROOTSYS/test/RootIDE
44
TBrowser Editor
René Brun
45
GUI C++ code generator
 When pressing ctrl+S on any
widget it is saved as a C++
macro file thanks to the
SavePrimitive methods
implemented in all GUI
classes. The generated
macro can be edited and then
executed via CINT
 Executing the macro restores
the complete original GUI as
well as all created signal/slot
connections in a global way
root [0] .x example.C
// transient frame
TGTransientFrame *frame2 = new TGTransientFrame(gClient>GetRoot(),760,590);
// group frame
TGGroupFrame *frame3 = new TGGroupFrame(frame2,"curve");
TGRadioButton *frame4 = new TGRadioButton(frame3,"gaus",10);
frame3->AddFrame(frame4);
René Brun
46
The GUI Builder
 The GUI builder
provides GUI
tools for
developing user
interfaces
based on the
ROOT GUI
classes. It
includes over
30 advanced
widgets and an
automatic C++
code generator.
René Brun
47
More GUI Examples
$ROOTSYS/tutorials/gui
$ROOTSYS/test/RootShower
$ROOTSYS/test/RootIDE
René Brun
48
Geometry
The GEOMetry package is used
to model very complex
detectors (LHC). It includes
-a visualization system
-a navigator (where am I,
distances, overlaps, etc)
René Brun
49
OpenGL
see $ROOTSYS/tutorials/geom
René Brun
50
Peak Finder + Deconvolutions
TSpectrum
René Brun
51
Fitters
Minuit
Fumili
LinearFitter
RobustFitter
Roofit
René Brun
52
Histograms
Analysis result: often a histogram
Value (x)
vs. count (y)
Menu:
View / Editor
René Brun
53
Fitting
Analysis result:
often a fit
based on a
histogram
René Brun
54
Fit
Fit = optimization in parameters,
e.g. Gaussian
For Gaussian:
f ( x )  [ 0]  e
( x [1]) 2

2[ 2 ]2
[0] = "Constant"
[1] = "Mean"
[2] = "Sigma" / "Width"
Objective: choose parameters [0], [1], [2] to get function
as close as possible to histogram
René Brun
55
Fit Panel
To fit a histogram:
right click histogram,
"Fit Panel"
Straightforward interface
for fitting!
René Brun
56
Roofit: a powerful fitting
framework
René Brun
see $ROOTSYS/tutorials/fit/RoofitDemo.C
57
Input/Output
Saving Data
Streaming, Reflection, TFile,
Schema Evolution
René Brun
59
Saving Objects – the Issues
long aLong = 42;
WARNING:
platform
dependent!
Storing aLong:
 How many bytes? Valid:
0x0000002a
0x000000000000002a
0x0000000000000000000000000000002a
 which byte contains 42, which 0? Valid:
char[] {42, 0, 0, 0}: little endian, e.g. Intel
char[] {0, 0, 0, 42}: big endian, e.g. PowerPC
René Brun
60
Platform Data Types
Fundamental data types:
size is platform dependent
Store "int" on platform A
0x000000000000002a
Read back on platform B – incompatible!
Data loss, size differences, etc:
0x000000000000002a
René Brun
61
ROOT Basic Data Types
Solution: ROOT typedefs
Signed
Char_t
Unsigned
UChar_t
Short_t
UShort_t
2
Int_t
UInt_t
4
Long64_t
ULong64_t
8
Double32_t
René Brun
sizeof [bytes]
1
float on disk, double in RAM
62
Object Oriented Concepts
 Class: the description of a “thing” in the system
 Object: instance of a class
TObject
 Methods: functions for a class
IsA
Members: a “has a”
relationship to the
class.
Inheritance: an “is
a” relationship to
the class.
René Brun
Event
HasA
Jets
HasA
HasA
Tracks
HasA
Momentum
HasA
EvNum
HasA
Charge
Segments
63
63
Saving Objects – the Issues
class TMyClass {
float fFloat;
Long64_t fLong;
};
TMyClass anObject;
WARNING:
platform
dependent!
Storing anObject:
need to know its members + base classes
need to know "where they are"
René Brun
64
Reflection
Need type description (aka reflection)
1. types, sizes, members
TMyClass is a class.
class TMyClass {
float fFloat;
Long64_t fLong;
};
Members:
 "fFloat", type float, size 4 bytes
 "fLong", type Long64_t, size 8 bytes
René Brun
65
Reflection
René Brun
– 10
– 8
– 6
– 4
– 2
– 0
TMyClass
Memory Address
Need type description
1. types, sizes, members
2. offsets in memory
class TMyClass {
float fFloat;
– 16
Long64_t fLong;
– 14
};
– 12
fLong
PADDING
fFloat
"fFloat" is at offset 0
"fLong" is at offset 8
66
I/O Using Reflection
– 16
– 14
– 12
– 10
– 8
– 6
– 4
– 2
– 0
René Brun
2a 00 00 00...
TMyClass
Memory Address
members  memory  re-order 
fLong
...00 00 00 2a
PADDING
fFloat
67
C++ Is Not Java
Lesson: need reflection!
Where from?
Java: get data members with
Class.forName("MyClass").getFields()
C++: get data members with
– oops. Not part of C++.
-- discussions for next C++
René Brun
68
Reflection For C++
Program parses header files, e.g. Funky.h:
rootcint –f Dict.cxx Funky.h LinkDef.h
Collects reflection data for types requested in
Linkdef.h
Stores it in Dict.cxx (dictionary file)
Compile Dict.cxx, link, load:
C++ with reflection!
René Brun
69
Reflection By Selection
LinkDef.h syntax:
#pragma
#pragma
#pragma
#pragma
René Brun
link
link
link
link
C++
C++
C++
C++
class MyClass+;
typedef MyType_t;
function MyFunc(int);
enum MyEnum;
70
Why So Complicated?
Simply use ACLiC:
.L MyCode.cxx+
Will create library with dictionary of all types in
MyCode.cxx automatically!
René Brun
71
Saving Objects: TFile
ROOT stores objects in TFiles:
TFile* f = new TFile("afile.root", "NEW");
TFile behaves like file system:
f->mkdir("dir");
TFile has a current directory:
f->cd("dir");
René Brun
72
Saving Objects, Really
Given a TFile:
TFile* f = new TFile("afile.root", "RECREATE");
Write an object deriving from TObject:
object->Write("optionalName");
"optionalName" or TObject::GetName()
Write any object (with dictionary):
f->WriteObject(object, "name");
René Brun
73
"Where Is My Histogram?"
TFile owns histograms, graphs, trees
(due to historical reasons):
TFile* f = new TFile("myfile.root");
TH1F* h = new TH1F("h","h",10,0.,1.);
h->Write();
Canvas* c = new TCanvas();
c->Write();
delete f;
h automatically deleted: owned by file.
c still there.
René Brun
74
TFile as Directory: TKeys
One TKey per object:
named directory entry
knows what it points to
like inodes in file system
Each TFile directory is collection of TKeys
TFile "myfile.root"
René Brun
TKey: "hist1", TH1F
TKey: "list", TList
TKey: "hist2", TH2F
TDirectory: "subdir"
TKey: "hist1", TH2D
75
TKeys Usage
Efficient for hundreds of objects
Inefficient for millions:
lookup ≥ log(n)
sizeof(TKey) on 64 bit machine: 160 bytes
Better storage / access methods available,
René Brun
76
Risks With I/O
Physicists can loop a lot:
For each particle collision
For each particle created
For each detector module
Do something.
Physicists can loose a lot:
Run for hours…
Crash.
Everything lost.
René Brun
77
Name Cycles
Create snapshots regularly:
MyObject;1
MyObject;2
MyObject;3
…
MyObject
Write() does not replace but append!
but see documentation TObject::Write()
René Brun
78
The "I" Of I/O
Reading is simple:
TFile* f = new TFile("myfile.root");
TH1F* h = 0;
f->GetObject("h", h);
h->Draw();
delete f;
Remember:
TFile owns histograms!
file gone, histogram gone!
René Brun
79
Ownership And TFiles
Separate TFile and histograms:
TFile* f = new TFile("myfile.root");
TH1F* h = 0;
TH1::AddDirectory(kFALSE);
f->GetObject("h", h);
h->Draw();
delete f;
… and h will stay around.
René Brun
80
Changing Class:a Problem
Things change:
class TMyClass {
float fFloat;
Long64_t fLong;
};
René Brun
81
Changing Class: a Problem
Things change:
class TMyClass {
double fFloat;
Long64_t fLong;
};
Inconsistent reflection data!
Objects written with old version cannot be read!
René Brun
82
Solution: Schema Evolution
Support changes:
1. removed members: just skip data
file.root
Long64_t fLong;
float fFloat;
RAM
ignore
float fFloat;
2. added members: just default-initialize them
(whatever the default constructor does)
TMyClass(): fLong(0)
file.root
float fFloat;
René Brun
RAM
Long64_t fLong;
float fFloat;
83
Solution: Schema Evolution
Support changes:
3. members change types
file.root
Long64_t fLong;
float fFloat;
RAM
Long64_t fLong;
double fFloat;
RAM (temporary)
float DUMMY;
René Brun
84
Supported Type Changes
Schema Evolution supports:
 float ↔ double, int ↔ long, etc (and pointers)
 TYPE* ↔ std::vector<TYPE>
 CLASS* ↔ TClonesArray
Adding / removing base class equivalent to
adding / removing data member
René Brun
85
Storing Reflection
Big Question: file == memory class layout?
Need to store reflection to file,
see TFile::ShowStreamerInfo():
StreamerInfo for class: TH1F, version=1
TH1
BASE
offset= 0 type= 0
TArrayF BASE
offset= 0 type= 0
StreamerInfo for class: TH1,
TNamed BASE
offset= 0
...
Int_t
fNcells offset= 0
TAxis
fXaxis offset= 0
René Brun
version=5
type=67
type= 3
type=61
86
Class Version
One TStreamerInfo for all objects of the same
class layout in a file
Can have multiple versions in same file
Use version number to identify layout:
class TMyClass: public TObject {
public:
TMyClass(): fLong(0), fFloat(0.) {}
virtual ~TMyClass() {}
...
ClassDef(TMyClass,1); // example class
};
René Brun
87
Random Facts On ROOT I/O
We know exactly what's in a TFile – need no
library to look at data!
ROOT files are zipped
Combine contents of TFiles with
$ROOTSYS/bin/hadd
Can even open
TFile("http://myserver.com/afile.root")
including read-what-you-need!
Nice viewer for TFile: new TBrowser
René Brun
88
I/O
Buffer
Object in
Memory
Streamer:
No need for
transient / persistent
classes
sockets
Net File
http
Web File
XML
XML File
SQL
DataBase
Local
File on
disk
René Brun
89
TFile / TDirectory
A TFile object may be divided in a hierarchy of
directories, like a Unix file system.
Two I/O modes are supported
 Key-mode (TKey). An object is identified by a name
(key), like files in a Unix directory. OK to support up
to a few thousand objects, like histograms,
geometries, mag fields, etc.
 TTree-mode to store event data, when the number
of events may be millions, billions.
René Brun
90
Self-describing files
Dictionary for persistent classes written to the
file.
ROOT files can be read by foreign readers
Support for Backward and Forward
compatibility
Files created in 2001 must be readable in 2015
Classes (data objects) for all objects in a file
can be regenerated via TFile::MakeProject
Root >TFile f(“demo.root”);
Root > f.MakeProject(“dir”,”*”,”new++”);
René Brun
91
Example of key mode
void keywrite() {
TFile f(“keymode.root”,”new”);
TH1F h(“hist”,”test”,100,-3,3);
h.FillRandom(“gaus”,1000);
h.Write()
}
void keyRead() {
TFile f(“keymode.root”);
TH1F *h = (TH1F*)f.Get(“hist”);
h.Draw();
}
René Brun
92
A Root file pippa.root
with two levels of
directories
René Brun
Objects in directory
/pippa/DM/CJ
eg:
/pippa/DM/CJ/h15
93
LHC: How Much Data?
Level 1 Rate
(Hz)
106
High Level-1 Trigger
(1 MHz)
LHCB
105
1 billion people
surfing the Web
104
HERA-B
ATLAS
CMS
KLOE
CDF II
CDF
103
102
104
René Brun
High No. Channels
High Bandwidth
(500 Gbit/s)
High Data Archive
(5 PetaBytes/year)
10 Gbits/s in Data base
H1
ZEUS
NA49
UA1
LEP
105
106
ALICE
STAR
107
Event Size (bytes)
94
ROOT Collection Classes
Or when one million TKeys are
not enough…
Collection Classes
The ROOT collections are polymorphic
containers that hold pointers to TObjects, so:
They can only hold objects that inherit from
TObject
They return pointers to TObjects, that have to
be cast back to the correct subclass
René Brun
96
Types Of Collections
TCollection
TSeqCollection
TList
TSortedList
THashTable
TOrdCollection
THashList
TMap
TObjArray
TBtree
TClonesArray
The inheritance hierarchy of the primary collection classes
René Brun
97
Collections (cont)
Here is a subset of collections supported by ROOT:





TObjArray
TClonesArray
THashTable
THashList
TMap
 Templated containers, e.g. STL (std::list etc)
René Brun
98
TObjArray
A TObjArray is a collection which supports
traditional array semantics via the overloading
of operator[].
Objects can be directly accessed via an index.
The array expands automatically when objects
are added.
René Brun
99
TClonesArray
Array of identical classes
(“clones”).
Designed for repetitive data
analysis tasks: same
type of objects created
and deleted many times.
No comparable class in
STL!
The internal data structure of a
TClonesArray
René Brun
100
Traditional Arrays
Very large number of new and delete calls in
large loops like this (N(100000) x N(10000)
times new/delete):
N(100000)
TObjArray a(10000);
while (TEvent *ev = (TEvent *)next()) {
for (int i = 0; i < ev->Ntracks; ++i) {
a[i] = new TTrack(x,y,z,...);
...
}
...
a.Delete();
N(10000)
}
René Brun
101
You better use a TClonesArray which reduces the number of
new/delete calls to only N(10000):
N(100000)
TClonesArray a("TTrack", 10000);
while (TEvent *ev = (TEvent *)next()) {
for (int i = 0; i < ev->Ntracks; ++i) {
new(a[i]) TTrack(x,y,z,...);
N(10000)
...
}
...
a.Delete();
}
Pair of new/delete calls cost about 4 μs
NN(109) new/deletes will save about 1 hour.
René Brun
102
THashTable - THashList
THashTable puts objects in one of several of its
short lists. Which list to pick is defined by
TObject::Hash(). Access much faster than map
or list, but no order defined. Nothing similar in
STL.
THashList consists of a THashTable for fast
access plus a TList to define an order of the
objects.
René Brun
103
TMap
TMap implements an associative array of
(key,value) pairs using a THashTable for
efficient retrieval (therefore TMap does
not conserve the order of the entries).
René Brun
104
ROOT Trees
Why Trees ?
 As seen previously, any object deriving from TObject
can be written to a file with an associated key with
object.Write()
 However each key has an overhead in the directory
structure in memory (up to 160 bytes). Thus,
Object.Write() is very convenient for simple objects
like histograms, but not for saving one object per
event.
 Using TTrees, the overhead in memory is in general
less than 4 bytes per entry!
René Brun
106
Why Trees ?
Extremely efficient write once,
read many ("WORM")
Designed to store >109 (HEP
events) with same data structure
Load just a subset of the objects
to optimize memory usage
Trees allow fast direct and
random access to any entry
(sequential access is the best)
René Brun
107
Building ROOT Trees
Overview of
Trees
Branches
5 steps to build a TTree
René Brun
108
Memory <--> Tree
Each Node is a branch in the Tree
Memory
0
T.GetEntry(6)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
T.Fill()
T
René Brun
109
ROOT I/O -- Split/Cluster
Tree version
Tree entries
Streamer
Branches
Tree in memory
File
René Brun
110
Writing/Reading a Tree
class Event : public Something
Header
{
fHeader;
Event.h
std::list<Vertex*> fVertices;
std::vector<Track> fTracks;
TOF
Write.C
Calor
fTOF;
*fCalor;
Read.C
}
main() {
main() {
Event *event = 0;
Event *event = 0;
TFile f(“demo.root”,”recreate”);
TFile f(“demo.root”);
int split = 99;
TTree *T = (TTree*)f.Get”T”);
//maximum split
TTree *T = new TTree(“T”,”demo Tree”);
T->SetBranchAddress(“event”,&event);
T->Branch(“event”,&event,split);
Long64_t N = T->GetEntries();
for (int ev=0;ev<1000;ev++) {
for (Long64_t ev=0;ev<N;ev++) {
event = new Event(…);
T->GetEntry(ev);
T->Fill();
// do something with event
delete event;
}
}
}
t->AutoSave();
}
René Brun
111
Tree structure
René Brun
112
Browsing a TTree with TBrowser
8 leaves of branch
Electrons
A double-click
to histogram
the leaf
8 Branches of T
René Brun
113
The TTreeViewer
René Brun
114
Tree structure
Branches: directories
Leaves: data containers
Can read a subset of all branches – speeds up
considerably the data analysis processes
Tree layout can be optimized for data analysis
The class for a branch is called TBranch
Variables on TBranch are called leaf (yes TLeaf)
Branches of the same TTree can be written to
separate files
René Brun
115
Five Steps to Build a Tree
Steps:
1. Create a TFile
2. Create a TTree
3. Add TBranch to the TTree
4. Fill the tree
5. Write the file
René Brun
116
Step 1: Create a TFile Object
Trees can be huge  need file for
swapping filled entries
TFile *hfile = new TFile("AFile.root");
René Brun
117
Step 2: Create a TTree Object
The TTree constructor:
Tree name (e.g. "myTree")
Tree title
TTree *tree = new TTree("myTree","A Tree");
René Brun
118
Step 3: Adding a Branch
Branch name
Address of the pointer
to the object
Event *event = new Event();
myTree->Branch("EventBranch", &event);
René Brun
119
Splitting a Branch
Setting the split level (default = 99)
Split level = 0
Split level = 99
tree->Branch("EvBr", &event, 64000, 0 );
René Brun
120
Splitting (real example)
Split level = 0
René Brun
Split level = 99
121
Splitting
 Creates one branch per member – recursively
 Allows to browse objects that are stored in trees,
even without their library
 Makes same members consecutive, e.g. for object
with position in X, Y, Z, and energy E, all X are
consecutive, then come Y, then Z, then E. A lot
higher zip efficiency!
 Fine grained branches allow fain-grained I/O - read
only members that are needed, instead of full object
 Supports STL containers, too!
René Brun
122
Performance Considerations
A split branch is:
Faster to read - the type does not have to be
read each time
Slower to write due to the large number of
buffers
René Brun
123
Step 4: Fill the Tree
Assign values to the object
contained in each branch
TTree::Fill() creates a new
entry in the tree: snapshot of
values of branches’ objects
Create a for loop
for (int e=0;e<100000;++e) {
event->Generate(e); // fill event
myTree->Fill();
// fill the tree
}
René Brun
124
Step 5: Write Tree To File
myTree->Write();
René Brun
125
Example macro
{
Event *myEvent = new Event();
TFile f("mytree.root");
TTree *t = new TTree("myTree","A Tree");
t->Branch("SplitEvent", &myEvent);
for (int e=0;e<100000;++e) {
myEvent->Generate();
t->Fill();
}
t->Write();
}
René Brun
126
Reading a TTree
Looking at a tree
How to read a tree
Trees, friends and chains
René Brun
127
Looking at the Tree
TTree::Print() shows the data layout
root [] TFile f("AFile.root")
root [] myTree->Print();
******************************************************************************
*Tree
:myTree
: A ROOT tree
*
*Entries :
10 : Total =
867935 bytes File Size =
390138 *
*
:
: Tree compression factor =
2.72
*
******************************************************************************
*Branch :EventBranch
*
*Entries :
10 : BranchElement (see below)
*
*............................................................................*
*Br
0 :fUniqueID :
*
*Entries :
10 : Total Size=
698 bytes One basket in memory
*
*Baskets :
0 : Basket Size=
64000 bytes Compression=
1.00
*
*............................................................................*
…
…
René Brun
128
Looking at the Tree
TTree::Scan("leaf:leaf:….") shows the values
root [] myTree->Scan("fNseg:fNtrack"); > scan.txt
root [] myTree->Scan("fEvtHdr.fDate:fNtrack:fPx:fPy","",
"colsize=13 precision=3 col=13:7::15.10");
******************************************************************************
* Row * Instance * fEvtHdr.fDate * fNtrack *
fPx *
fPy *
******************************************************************************
*
0 *
0 *
960312 *
594 *
2.07 *
1.459911346 *
*
0 *
1 *
960312 *
594 *
0.903 *
-0.4093382061 *
*
0 *
2 *
960312 *
594 *
0.696 *
0.3913401663 *
*
0 *
3 *
960312 *
594 *
-0.638 *
1.244356871 *
*
0 *
4 *
960312 *
594 *
-0.556 *
-0.7361358404 *
*
0 *
5 *
960312 *
594 *
-1.57 *
-0.3049036264 *
*
0 *
6 *
960312 *
594 *
0.0425 *
-1.006743073 *
*
0 *
7 *
960312 *
594 *
-0.6 *
-1.895804524 *
René Brun
129
TTree Selection Syntax
MyTree->Scan();
Prints the first 8 variables of the tree.
MyTree->Scan("*");
Prints all the variables of the tree.
Select specific variables:
MyTree->Scan("var1:var2:var3");
Prints the values of var1, var2 and var3.
A selection can be applied in the second argument:
MyTree->Scan("var1:var2:var3", "var1==0");
Prints the values of var1, var2 and var3 for the entries where
var1 is exactly 0.
René Brun
130
Looking at the Tree
TTree::Show(entry_number) shows the values for
one entry
root [] myTree->Show(0);
======> EVENT:0
EventBranch
= NULL
fUniqueID
= 0
fBits
= 50331648
[...]
fNtrack
= 594
fNseg
= 5964
[...]
fEvtHdr.fRun
= 200
[...]
fTracks.fPx
= 2.066806, 0.903484, 0.695610, 0.637773,...
fTracks.fPy
= 1.459911, -0.409338, 0.391340,
1.244357,...
René Brun
131
How To Read a TTree
Example:
1. Open the Tfile
TFile f("tree4.root")
OR
2. Get the TTree
TTree *t4=0;
f.GetObject("t4",t4)
René Brun
132
How to Read a TTree
3. Create a variable pointing to the data
root [] Event *event = 0;
4. Associate a branch with the variable:
root [] t4->SetBranchAddress("event_split", &event);
5. Read one entry in the TTree
root [] t4->GetEntry(0)
root [] event->GetTracks()->First()->Dump()
==> Dumping object at: 0x0763aad0, name=Track, class=Track
fPx
0.651241
X component of the momentum
fPy
1.02466
Y component of the momentum
fPz
1.2141
Z component of the momentum
[...]
René Brun
133
Example macro
{
Event *ev = 0;
TFile f("mytree.root");
TTree *myTree = (TTree*)f->Get("myTree");
myTree->SetBranchAddress("SplitEvent",&ev);
for (int e=0;e<100000;++e) {
myTree->GetEntry(e);
ev->Analyse();
}
}
René Brun
134
Chains of Trees
A TChain is a collection of Trees.
Same semantics for TChains and TTrees
 root > .x h1chain.C
 root > chain.Process(“h1analysis.C”)
{
//creates a TChain to be used by the h1analysis.C class
//the symbol H1 must point to a directory where the H1 data sets
//have been installed
TChain chain("h42");
chain.Add("$H1/dstarmb.root");
chain.Add("$H1/dstarp1a.root");
chain.Add("$H1/dstarp1b.root");
chain.Add("$H1/dstarp2.root");
}
René Brun
135
Data Volume & Organisation
100MB
1
TTree
1GB
1
10GB
100GB
5
50
1TB
500
10TB
100TB
5000
50000
1PB
TChain
A TFile typically contains 1 TTree
A TChain is a collection of TTrees or/and
TChains
A TChain is typically the result of a query to a
file catalog
René Brun
136
Friends of Trees
- Adding new branches to existing tree
without touching it, i.e.:
myTree->AddFriend("ft1", "friend.root")
- Unrestricted access to the friend's data
via virtual branch of original tree
René Brun
137
Tree Friends
01
23
45
67
89
10
11
12
13
14
15
16
17
18
René Brun
0
1
2
3
4
5
6
7
8
Entry # 8
01
23
45
67
89
10
11
12
13
14
15
16
17
18
9
10
11
12
13
14
15
16
17
18
Public
Public
User
read
read
Write
138
Tree Friends
Processing time
independent of the
number of friends
unlike table joins
in RDBMS
Collaboration-wide
public read
Analysis group
protected
user
private
x
TFile f1("tree1.root");
tree.AddFriend("tree2", "tree2.root")
tree.AddFriend("tree3", "tree3.root");
tree.Draw("x:a", "k<c");
tree.Draw("x:tree2.x", "sqrt(p)<b");
René Brun
139
Summary: Reading Trees
TTree is one of the most powerful collections
available for HEP
Extremely efficient for huge number of data sets
with identical layout
Very easy to look at TTree - use TBrowser!
Write once, read many (WORM) ideal for
experiments' data
Still: extensible, users can add their own tree as
friend
René Brun
140
Analyzing Trees
Selectors, Proxies, PROOF
René Brun
141
Recap
TTree efficient storage and access for huge
amounts of structured data
Allows selective access of data
TTree knows its layout
Almost all HEP analyses based on TTree
René Brun
142
TTree Data Access
Data access via TTree / TBranch is complex
Lots of code identical for all TTrees:
getting tree from file, setting up branches, entry
loop
Lots of code completely defined by TTree:
branch names, variable types, branch ↔
variable mapping
Need to enable / disable branches "by hand"
Want common interface to allow generic "TTree
based analysis infrastructure"
René Brun
143
TTree Proxy
The solution:
write analysis code using branch names
as variables
auto-declare variables from branch
structure
read branches on-demand
René Brun
144
Branch vs. Proxy
Take a simple branch:
class ABranch {
int a;
};
ABranch* b=0;
tree->Branch("abranch", &b);
Access via proxy in pia.C:
double pia() {
return sqrt(abranch.a * TMath::Pi());
}
René Brun
145
Proxy Details
double pia() {
return sqrt(abranch.a * TMath::Pi());
}
Analysis script somename.C must contain
somename()
Put #includes into somename.h
Return type must convert to double
René Brun
146
Proxy Advantages
double pia() {
return sqrt(abranch.a * TMath::Pi());
}
Very efficient: only reads leaf "a"
Can use arbitrary C++
Leaf behaves like a variable
Uses meta-information stored with TTree:
branch names
types contained in branches / leaves
and their members (unrolling)
René Brun
147
Simple Proxy Analysis
TTree::Draw() runs simple analysis:
TFile* f = new TFile("tree.root");
TTree* tree = 0;
f->GetObject("MyTree", tree);
tree->Draw("pia.C+");
Compiles pia.C
Calls it for each event
Fills histogram named "htemp" with return value
René Brun
148
Behind The Proxy Scene
TTree::Draw("pia.C+") creates helper class
generatedSel: public TSelector {
#include "pia.C"
...
};
pia.C gets #included inside generatedSel!
Its data members are named like leaves,
wrap access to Tree data
Can also generate Proxy via
tree->MakeProxy("MyProxy", "pia.C")
René Brun
149
TSelector
generatedSel: public TSelector
TSelector is base for event-based analysis:
1. setup
TSelector::Begin()
2. analyze an entry
TSelector::Process(Long64_t entry)
3. draw / save histograms
TSelector::Terminate()
René Brun
150
Extending The Proxy
Given Proxy generated for script.C (like pia.C):
looks for and calls
void
Bool_t
void
script_Begin(TTree* tree)
script_Process(Long64_t entry)
script_Terminate()
Correspond to TSelector's functions
Can be used to set up complete analysis:
fill several histograms,
control event loop
René Brun
151
Proxy And TClonesArray
TClonesArray as branch:
class TGap { float ore; };
TClonesArray* b = new TClonesArray("TGap");
tree->Branch("gap", &b);
Cannot loop over entries and return each:
double singapore() {
return sin(gap.ore[???]);
}
René Brun
152
Proxy And TClonesArray
TClonesArray as branch:
class TGap { float ore; };
TClonesArray* b = new TClonesArray("TGap");
tree->Branch("gap", &b);
Implement it as part of pia_Process() instead:
Bool_t pia_Process(Long64_t entry) {
for (int i=0; i<gap.GetEntries(); ++i)
hSingapore->Fill(sin(gap.ore[i]));
return kTRUE;
}
René Brun
153
Extending The Proxy, Example
Need to declare hSingapore somewhere!
But pia.C is #included inside generatedSel, so:
TH1* hSingapore;
void pia_Begin() {
hSingapore = new TH1F(...);
}
Bool_t pia_Process(Long64_t entry) {
hSingapore->Fill(...);
}
void pia_Terminate() {
hSingapore->Draw();
}
René Brun
154
ROOT on the GRID(s)
Many ways to use ROOT on a GRID
Single laptop
Multi-Core desktop
Local area cluster
Cluster on a GRID center
Clusters on GRID(s)
GRID: the idea
Exploit resources available on the Net: You
take and you give.
Not a new idea: Seti, Condor, etc
BUT:
 Can we make it transparent?
 What type of work?
Problem is more complex than a simple CPU
distribution task
 WAN data access
 Data management
 Authentication, time & space sharing
 Disk and CPU quotas
René Brun
156
The HEP GRID
100,000 computers
5,000 physicists
in 1000 locations
in 1000 locations
René Brun
157
GRID: Users profile
Few big users submitting
many long jobs (Monte Carlo,
reconstruction)
They want to run many jobs in
one month
Many users submitting
many short jobs (physics
analysis)
They want to run many
jobs in one hour or less
René Brun
158
Big Users
Monte Carlo jobs (one hour  one day)
 Each job generates one file (1 GigaByte)
Reconstruction job (10 minutes -> one hour)
 Input from the MC job or copied from a storage
center
 Output (< input) is staged back to a storage center
Success rate (90%). If the job fails you
resubmit it.
For several years, GRID projects focused effort
on big users.
René Brun
159
Small Users
 Scenario 1: submit one batch job to the GRID. It runs
somewhere with varying response times.
 Scenario 2: Use a splitter to submit many batch jobs to
process many data sets (eg CRAB, Ganga, Alien).
Output data sets are merged automatically. Success
rate < 90%. You see the final results only when the last
job has been received and all results merged.
 Scenario 3: Use PROOF (automatic splitter and
merger). Success rate close to 100%. You can see
intermediate feedback objects like histograms. You run
from an interactive ROOT session.
René Brun
160
Scenario 1 & 2: PROS
Job level parallelism. Conventional
model. Nothing to change in user
program.
Initialisation phase
Loop on events
Termination
Same job can run on laptop or on a
GRID job.
René Brun
161
Scenario 1 & 2: CONS(1)
Long tail in the jobs wall clock time
distribution.
René Brun
162
Scenario 1 & 2: CONS(2)
Can only merge output after a timecut.
More data movement (input & output)
Cannot have interactive feedback
Two consecutive queries will produce
different results (problem with rare
events)
Will use only one core on a multicore
laptop or GRID node.
Hard to control priorities and quotas.
René Brun
163
Scenario 3: PROS
Predictive response. Event level
parallelism. Workers terminate at the
same time
Process moved to data as much as
possible, otherwise use network.
Interactive feedback
Can view running queries in different
ROOT sessions.
Can take advantage of multicore cpus
René Brun
164
Scenario 3: CONS
Event level parallelism. User code must
follow a template: the TSelector API.
Good for a local area cluster. More
difficult to put in place in a GRID
collection of local area clusters.
Interactive schedulers, priority managers
must be put in place.
Debugging a problem slightly more
difficult.
René Brun
165
GRID & Parallelism: 1
The user application splits the problem in
N subtasks. Each task is submitted to a
GRID node (minimal input, minimal
output).
The GRID task can run synchronously or
asynchronously. If the task fails or timeout, it can be resubmitted to another
node.
One of the first and simplest use of the
GRID, but not many applications in HEP.
René Brun
166
GRID & Parallelism: 2
The typical case of Monte Carlo or
reconstruction in HEP.
It requires massive data transfers between the
main computing centers.
This activity has concentrated so far a very
large fraction of the GRID projects and
budgets.
It has been an essential step to foster
coordination between hundreds of sites,
improve international network bandwidths and
robustness.
René Brun
167
GRID & Parallelism: 3
Distributed data analysis will be a major
challenge for the coming experiments.
This is the area with thousands of people
running many different styles of queries, most
of the time in a chaotic way.
The main challenges:
 Access to millions of data sets (eg 500 TeraBytes)
 Best match between execution and data location
 Distributing/compiling/linking users code(a few
thousand lines) with experiment large libraries (a
few million lines of code).
 Simplicity of use
 Real time response
 Robustness.
René Brun
168
GRID & Parallelism: 3a
Currently 2 different & competing
directions for distributed data analysis.
Batch solution using the existing GRID
infrastructure for Monte Carlo and
reconstruction programs. A front-end
program partitions the problem to analyze
ND data sets on NP processors.
Interactive solution PROOF. Each query is
parallelized with an optimum match of
execution and data location.
René Brun
169
Parallelism: Where ?
Multi-Core CPU laptop/desktop
2(2007) 32(2012?)
Network of desktops
Local Cluster
with multi-core CPUs
GRID(s)
René Brun
170
Download