What is PAT and how to use it
Sudhir Malik
Fermilab/University of Nebraska-Lincoln
PAT Tutorial – CERN – 4-8 April 2011
[email protected]
PAT Tutorial @CERN – 4-8 April 2011
CMS Event Data Model (EDM)
But before even PAT, briefly review CSMSSW
• The CMS software CMSSW is based on ROOT
• Pros:
• Single File Structure
• Reconstruction of detector object, physics objects (POGs)
• Physics Analysis ( PAGs)
• Track event from initial recording to “final” paper - Provenance
• Challenge:
• strong software requirements
• flexible, easy to extend
• Not be space expensive
• More info can be found at
[email protected]
PAT Tutorial @CERN – 4-8 April 2011
EDM Overview
[email protected]
PAT Tutorial @CERN – 4-8 April 2011
The Event
• Built up of several independent ROOT trees
• one for each object class
• Connected via 'SmartPointer' relations
• edm::Ref, edm::Ptr, …
• composed object with minimal space overhead
[email protected]
PAT Tutorial @CERN – 4-8 April 2011
The Event
• Built up of several independent ROOT trees (one for each
object class).
• Interconnected via 'SmartPointer' relations (edm::Ref, edm::Ptr, …)
• Strong hierarchy: There is no direct connection between basic
objects and super structures.
[email protected]
PAT Tutorial @CERN – 4-8 April 2011
The Event
• Central structure:
• No connection between basic objects themselves
[email protected]
PAT Tutorial @CERN – 4-8 April 2011
Communication via the Event Content
• Configurable Modules communicate via the Event
• Extend/modify the event by alternative modules (e.g. fit
• Different modules ( e.g. different
methods, models
• different configuration of the
same model
• Event history tracked by provenance
[email protected]
PAT Tutorial @CERN – 4-8 April 2011
Available EDModules
write to
Write a new object to the
Read from
Discard n event based
on retrieved info
Read from
Everything else based
On retrieved info
[email protected]
PAT Tutorial @CERN – 4-8 April 2011
The Event Content
• Tiered data structure: RECO / AOD
• View the event content as a complicated (grown) organic
AOD Tier
The data tier guarantees what
information will be available
• Different levels of granularity (resemble the order of
[email protected]
PAT Tutorial @CERN – 4-8 April 2011
The Event Content
• Add/keep/drop trees to adapt content and size
• some decisions have already have been taken for you
• via the data tier definitions
The minimum I want to know
AOD Tier
[email protected]
The data tier guarantees what
information will be available
PAT Tutorial @CERN – 4-8 April 2011
FWLite: A light Version of EDM
Sometimes people think the full EDM framework (FWFull)
is a bit heavy…Even for those people CMS has something to
This is bare Root with known data formats (with the
same performance!)
• Python configuration, edm::Handle, TFileService, data access equivalent to
• Also PAT is fully compliant with (and even especially supports) FWLite.
• NO writing to the event content!
• Full framework ↔ FWLite: This is NOT an exclusive OR!
[email protected]
PAT Tutorial @CERN – 4-8 April 2011
AT Recommendation
• Use cmsRun with crab or batch systems for large scale analyses
• Use cmsRun to write persistent datasets to your disc space and
for more complex analysis tasks
• Use FWLite for testing/interactive analysis on a complexity level
comparable with complex analysis tasks.
A typical FWLite python configuration file:
[email protected]
PAT Tutorial @CERN – 4-8 April 2011
OR Even Better: One to Rule'em All...
• Write a bare C++ class, which derives from the BasicAnalyzer class:
• EDAnalyzerWrapper.h
• FWLiteAnalyzerWrapper.h
• EDFilterWrapper.h
• FWLiteFilterWrapper.h
• AT provides Wrapper Classes to transform this class into an EDAnalyzer or a
FWLiteAnalyzer executable ( see twiki WorkBookFWLiteExamples)
• The same exists for EDFilters.
• Check Exercise 4 for more details
[email protected]
PAT Tutorial @CERN – 4-8 April 2011
Getting Started With an Analysis at CMS
What do you want to do? What do you really(!) have to
do to achieve this?
is not
• Selecting events.
• Understanding data.
• Understanding corrections.
• Convincing others of the results
• Writing notes and papers!
[email protected]
• Writing your own histogram plotting tool
• Writing/maintaining your own ntupelizer
• Convincing others that your variable
definitions are
PAT Tutorial @CERN – 4-8 April 2011
Guideline for s/w code used in CMS
We even have an official document containing some guidelines for
• Stick to event provenance!
• Leave the EDM as late as possible (if at all)
• Use official tools/code where possible.
• Use PAT (as this includes already the first three points
[email protected]
PAT Tutorial @CERN – 4-8 April 2011
Typical CMS Analysis Workflow
• Prompt reconstruction at Tier-0.
• Central skims at Tier-1's.
• Users run cmsRun at Tier-2's:
• Perform high level analysis steps.
• Preselect events.
• Write their own user defined
• EventContent to private T2/T3 space
• The latter step might be iterated.
• Copy reduced datasets to your favorite
• Run your final analysis/produce plots
[email protected]
PAT Tutorial @CERN – 4-8 April 2011
AT Recommendation
• Use cmsRun with crab
• Use cmsRun on batch systems or FWLite.
• Write PAT Tuples
• Write flat n-Tuples if you really think that you
need them BUT don't write your own n-tupelize
• Rather have at look at:
PAT helps you create a user-defined
[email protected]
PAT Tutorial @CERN – 4-8 April 2011
Difficulties with the EDM
The EDM bares two main problems from the
analysist's point of view:
• Calculation/retrieval of high level analysis information
(complicated pointer arithmetic! what do I need? Where
do I find it?)
• Reduction only to the relevant high level analysis
information (where is the dropped data used throughout
the event?)
[email protected]
PAT Tutorial @CERN – 4-8 April 2011
Top 5 Analyst's Problems
• I want use muons only. I dropped everything else from the event. Now I cannot
access the 2/ndof?
• I found out that the GsfElectron collection consists of many objects but only 10%
can be real. There must be some electronID, how can I find and apply it?
• I found out that CaloJets are uncorrected. How can I apply the JetMET calibration
for the absolute JES to them? Do I need DB access for that?
• I want to match reco objects to generated particles. I want to write my own
matching. Is there an official recommendation for algorithms and parameters?
• I want to match trigger muons to offline reconstructed muons. How do I do this?
• Trouble shoot for this kind of problems can be found at WorkBookTroubleShooting
[email protected]
PAT Tutorial @CERN – 4-8 April 2011
Framework Subtleties: Part 1
• I want use muons only. I dropped everything else from the event.
Now I cannot access the 2/ndof?
• You dropped the generalTracks collection from the event content,
did not consider that reco::Muon internally points to the
inner track points in generalTracks collection it was created from
[email protected]
PAT Tutorial @CERN – 4-8 April 2011
Framework Subtleties: Part 2
• You found out how to use electronID via edm::Associations:
• Any additional selection on the electrons will screw the
synchronization (w/o error!)
[email protected]
PAT Tutorial @CERN – 4-8 April 2011
What is the Physics Analysis Toolkit
PAT is a toolkit that is an integral part of the CMSSW framework
• It is an interface between the sometimes complicated EDM and the simple
mind of the common user.
• It serves as well tested and supported common ground for group and user
• It facilitates reproducibility and comprehensibility of analyses
• You can view it as a common language between CMS analysts
• If another CMS analyst describes you a PAT analysis you can easily know
what he/she is talking about
[email protected]
PAT Tutorial @CERN – 4-8 April 2011
Three aspects of PAT
Common Tool
• b/w RECO expertise & Analysis Level
• simplifies access via DataFormats
• canalizes expertise (via POG &
• crossing point between POGs & PAGs
('vertical integration')
• approved algorithms & sensible defaults
• synergy (everybody can profit from
recent developments)
• quick start into analysis for the beginners
Common Format
• facilitates transfer & comparisons
• PAG common configurations
• sustained provenance
[email protected]
PAT Tutorial @CERN – 4-8 April 2011
Facilitated Access to Event Information
• Do you know how to access this event information within the EDM?
• With PAT Candidates you get this just by calling member functions!
• Note:
Each PAT Candidate IS a corresponding reco::RecoCandidate (+ more)
[email protected]
PAT Tutorial @CERN – 4-8 April 2011
The PAT Data Formats
• All pat::Objects inherit from their corresponding
• A PAT Candidate is a reco::RecoCandidate + more
[email protected]
PAT Tutorial @CERN – 4-8 April 2011
PAT Candidate Member Functions
Check the Documentation: WorkBookPATDataFormats
[email protected]
PAT Tutorial @CERN – 4-8 April 2011
Combine Flexibility and User Friendliness
• You can choose yourself whether you really need all the
• extra information that the PAT Candidates provide
• Still you don't need to know, how EDM/PAT manages this
access for you under the hood
User Friendliness
• The key is: configuration of DataFormats by cfi file! (e.g. for pat::Jets)
[email protected]
PAT Tutorial @CERN – 4-8 April 2011
Configuration of PAT DataFormats
You can configure the content of the DataFormats yourself (example: pat::Jet)!
Size: 14 kb/event ( for ttbar)
• This can be slimmed even further
(exercise 6)
[email protected]
PAT Tutorial @CERN – 4-8 April 2011
The PAT Workflow
Have a look at:
Pre-Production steps before PAT
Candidate Creation
PAT Candidate creation
Main collection (w/o cleaning)
Main collection (with cleaning)
Resembled by the structure of the python directory in
the PatAlgos package (check it out!)
[email protected]
PAT Tutorial @CERN – 4-8 April 2011
EventContent of the default PAT Tuple
• Have a look to
Size: 14kb/event (for ttbar)
• Have a look to
• But decide yourself how your PAT Tuple should look like (add reco::Tracks or
reco::GenParticles to the Event Content or BTag information to the jets, etc ... )
[email protected]
PAT Tutorial @CERN – 4-8 April 2011
The concept of Maximal Configuration
object embedding
pat event content
• Configure your own
DataFormats via embedding
(see Lecture 2.2, exercise 6)
• Add any extra info
you need to the
workflow tools
pat selectors
• Configure your workflow via
tools that PAT provides (see
Lecture 2.1/Exercise 05).
[email protected]
PAT Tutorial @CERN – 4-8 April 2011
• Apply selections via the
The Code Location
• DataFormats/PatCandidates
• Definition of all PAT Candidates.
• pat::Photon, pat::Electron, pat::Muon, pat::Tau, pat::Jet, pat::MET, …
• PhysicsTools/PatAlgos
• Implementation and filling of all data formats.
• Definition of common workflow and PAT tools
• PhysicsTools/PatUtils
• Definition of common tools and helper functions used in
• PhysicsTools/PatExamples
• Location of many examples e.g. all non-trivial examples used during this
[email protected]
PAT Tutorial @CERN – 4-8 April 2011
PAT is part of any CMSSW release. We recommend to use it from the release!
Have a look at: SWGuidePATRecipes
[email protected]
PAT Tutorial @CERN – 4-8 April 2011
Check the the main entry page of PAT in the software guide: SWGuidePAT
A short extract of possible Support:
• Lecturers & Tutors
• Hypernews
• Community
• POG/PAG contacts
• Developers
• The quite developed PAT
• Documentation!
[email protected]
PAT Tutorial @CERN – 4-8 April 2011
• SWGuidePAT/WorkBookPAT Main documentation pages
• WorkBookPATDataFormats Description of all PAT Candidate
• WorkBookPATWorkflow Description of the PAT workflow
• WorkBookPATConfiguration Description of the configuration of PAT
• SWGuidePATTools Description of all PAT tools
• WorkBookPATTutorial Tutorials and examples to get started
• SWGuidePATRecipes Installation recipes
• SWGuidePATEventSize Tools for event size estimate
And last but not least: This Tutorial and/or former Tutorials...
[email protected]
PAT Tutorial @CERN – 4-8 April 2011
By now you should be prepared to do the following Exercises on
WorkBookPATTutorial: Have Fun!
Exercise 1 (WorkBookPATDocNavigationExercise)
The PAT Documentation is one of the most looked after parts of the WorkBook. To
know your documentation and how to use it can speed up your learning curve
enormously. Learn more about the PAT Documentation and how to make effective
use of it.
Exercise 2 (WorkBookTupleCreationExercise)
Learn how the default PAT tuple is produced
Exercise 3 (WorkBookTupleCrabExercise)
This is part of the crab tutorial. Once you are doing large scale analyses you might
need crab. For this tutorial you may skip this exercise.
[email protected]
PAT Tutorial @CERN – 4-8 April 2011