Malik_Lecture_1.1_PAT_Intro_PATtutorial_April2011

advertisement
What is PAT and how to use it
Sudhir Malik
Fermilab/University of Nebraska-Lincoln
PAT Tutorial – CERN – 4-8 April 2011
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
1/36
CMS Event Data Model (EDM)
But before even PAT, briefly review CSMSSW
• The CMS software CMSSW is based on ROOT
• Pros:
• Single File Structure
• Reconstruction of detector object, physics objects (POGs)
• Physics Analysis ( PAGs)
• Track event from initial recording to “final” paper - Provenance
• Challenge:
• strong software requirements
• flexible, easy to extend
• Not be space expensive
• More info can be found at
WorkBookCMSSWFramework
WorkBookAnalysisOverviewIntroduction
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
2/36
EDM Overview
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
3/36
The Event
• Built up of several independent ROOT trees
• one for each object class
• Connected via 'SmartPointer' relations
• edm::Ref, edm::Ptr, …
• composed object with minimal space overhead
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
4/36
The Event
• Built up of several independent ROOT trees (one for each
object class).
• Interconnected via 'SmartPointer' relations (edm::Ref, edm::Ptr, …)
• Strong hierarchy: There is no direct connection between basic
objects and super structures.
5/36
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
The Event
• Central structure:
• No connection between basic objects themselves
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
6/36
Communication via the Event Content
• Configurable Modules communicate via the Event
• Extend/modify the event by alternative modules (e.g. fit
algorithms):
• Different modules ( e.g. different
methods, models
• different configuration of the
same model
• Event history tracked by provenance
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
7/36
Available EDModules
write to
Write a new object to the
event
Read from
Discard n event based
on retrieved info
Read from
Everything else based
On retrieved info
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
8/36
The Event Content
• Tiered data structure: RECO / AOD
• View the event content as a complicated (grown) organic
structure
AOD Tier
The data tier guarantees what
information will be available
RECO Tier
• Different levels of granularity (resemble the order of
creation)
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
9/36
The Event Content
• Add/keep/drop trees to adapt content and size
• some decisions have already have been taken for you
• via the data tier definitions
The minimum I want to know
AOD Tier
RECO Tier
malik@fnal.gov
The data tier guarantees what
information will be available
PAT Tutorial @CERN – 4-8 April 2011
10/36
FWLite: A light Version of EDM
Sometimes people think the full EDM framework (FWFull)
is a bit heavy…Even for those people CMS has something to
offer:
FWLite:
This is bare Root with known data formats (with the
same performance!)
• Python configuration, edm::Handle, TFileService, data access equivalent to
EDM
• Also PAT is fully compliant with (and even especially supports) FWLite.
• NO writing to the event content!
• Full framework ↔ FWLite: This is NOT an exclusive OR!
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
11/36
AT Recommendation
• Use cmsRun with crab or batch systems for large scale analyses
• Use cmsRun to write persistent datasets to your disc space and
for more complex analysis tasks
• Use FWLite for testing/interactive analysis on a complexity level
comparable with complex analysis tasks.
A typical FWLite python configuration file:
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
12/36
OR Even Better: One to Rule'em All...
• Write a bare C++ class, which derives from the BasicAnalyzer class:
PhysicsTools/UtilAlgos/interface/BasicAnalyzer.h
PhysicsTools/UtilAlgos/interface/:
• EDAnalyzerWrapper.h
• FWLiteAnalyzerWrapper.h
• EDFilterWrapper.h
• FWLiteFilterWrapper.h
• AT provides Wrapper Classes to transform this class into an EDAnalyzer or a
FWLiteAnalyzer executable ( see twiki WorkBookFWLiteExamples)
• The same exists for EDFilters.
• Check Exercise 4 for more details
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
13/36
Getting Started With an Analysis at CMS
What do you want to do? What do you really(!) have to
do to achieve this?
Physics
is
is not
• Selecting events.
• Understanding data.
• Understanding corrections.
• Convincing others of the results
• Writing notes and papers!
malik@fnal.gov
• Writing your own histogram plotting tool
• Writing/maintaining your own ntupelizer
• Convincing others that your variable
definitions are
PAT Tutorial @CERN – 4-8 April 2011
14/36
Guideline for s/w code used in CMS
We even have an official document containing some guidelines for
you:
https://cms-physics.web.cern.ch/cms-physics/Analysis-Code-guidelines.pdf
https://twiki.cern.ch/twiki/bin/view/CMS/Internal/Publications
• Stick to event provenance!
• Leave the EDM as late as possible (if at all)
• Use official tools/code where possible.
• Use PAT (as this includes already the first three points
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
15/36
Typical CMS Analysis Workflow
• Prompt reconstruction at Tier-0.
• Central skims at Tier-1's.
• Users run cmsRun at Tier-2's:
• Perform high level analysis steps.
• Preselect events.
• Write their own user defined
• EventContent to private T2/T3 space
• The latter step might be iterated.
• Copy reduced datasets to your favorite
machine.
• Run your final analysis/produce plots
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
16/36
AT Recommendation
• Use cmsRun with crab
• Use cmsRun on batch systems or FWLite.
• Write PAT Tuples
• Write flat n-Tuples if you really think that you
need them BUT don't write your own n-tupelize
• Rather have at look at:
SWGuideEDMNtuples
PAT helps you create a user-defined
EventContent
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
17/36
Difficulties with the EDM
The EDM bares two main problems from the
analysist's point of view:
• Calculation/retrieval of high level analysis information
(complicated pointer arithmetic! what do I need? Where
do I find it?)
• Reduction only to the relevant high level analysis
information (where is the dropped data used throughout
the event?)
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
18/36
Top 5 Analyst's Problems
• I want use muons only. I dropped everything else from the event. Now I cannot
access the 2/ndof?
• I found out that the GsfElectron collection consists of many objects but only 10%
can be real. There must be some electronID, how can I find and apply it?
• I found out that CaloJets are uncorrected. How can I apply the JetMET calibration
for the absolute JES to them? Do I need DB access for that?
• I want to match reco objects to generated particles. I want to write my own
matching. Is there an official recommendation for algorithms and parameters?
• I want to match trigger muons to offline reconstructed muons. How do I do this?
• Trouble shoot for this kind of problems can be found at WorkBookTroubleShooting
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
19/36
Framework Subtleties: Part 1
• I want use muons only. I dropped everything else from the event.
Now I cannot access the 2/ndof?
• You dropped the generalTracks collection from the event content,
did not consider that reco::Muon internally points to the
inner track points in generalTracks collection it was created from
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
20/36
Framework Subtleties: Part 2
• You found out how to use electronID via edm::Associations:
• Any additional selection on the electrons will screw the
synchronization (w/o error!)
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
21/36
What is the Physics Analysis Toolkit
PAT is a toolkit that is an integral part of the CMSSW framework
• It is an interface between the sometimes complicated EDM and the simple
mind of the common user.
• It serves as well tested and supported common ground for group and user
analyses
• It facilitates reproducibility and comprehensibility of analyses
• You can view it as a common language between CMS analysts
• If another CMS analyst describes you a PAT analysis you can easily know
what he/she is talking about
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
22/36
Three aspects of PAT
Interface
Common Tool
• b/w RECO expertise & Analysis Level
contacts)
• simplifies access via DataFormats
• canalizes expertise (via POG &
PAG
• crossing point between POGs & PAGs
('vertical integration')
• approved algorithms & sensible defaults
• synergy (everybody can profit from
recent developments)
• quick start into analysis for the beginners
Common Format
• facilitates transfer & comparisons
• PAG common configurations
• sustained provenance
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
23/36
Facilitated Access to Event Information
• Do you know how to access this event information within the EDM?
• With PAT Candidates you get this just by calling member functions!
• Note:
Each PAT Candidate IS a corresponding reco::RecoCandidate (+ more)
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
24/36
The PAT Data Formats
• All pat::Objects inherit from their corresponding
reco::RecoCandidates
• A PAT Candidate is a reco::RecoCandidate + more
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
25/36
PAT Candidate Member Functions
Check the Documentation: WorkBookPATDataFormats
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
26/36
Combine Flexibility and User Friendliness
• You can choose yourself whether you really need all the
• extra information that the PAT Candidates provide
• Still you don't need to know, how EDM/PAT manages this
access for you under the hood
Flexibility
User Friendliness
Maximal
Configurability
• The key is: configuration of DataFormats by cfi file! (e.g. for pat::Jets)
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
27/36
Configuration of PAT DataFormats
You can configure the content of the DataFormats yourself (example: pat::Jet)!
Size: 14 kb/event ( for ttbar)
• This can be slimmed even further
(exercise 6)
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
28/36
The PAT Workflow
Have a look at:
WorkBookPATWorkflow
Pre-Production steps before PAT
Candidate Creation
PAT Candidate creation
Main collection (w/o cleaning)
Main collection (with cleaning)
Resembled by the structure of the python directory in
the PatAlgos package (check it out!)
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
29/36
EventContent of the default PAT Tuple
• Have a look to patEventContent_cff.py:
Size: 14kb/event (for ttbar)
• Have a look to patTemplate_cfg.py:
• But decide yourself how your PAT Tuple should look like (add reco::Tracks or
reco::GenParticles to the Event Content or BTag information to the jets, etc ... )
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
30/36
The concept of Maximal Configuration
DataFormats
EventContent
object embedding
pat event content
• Configure your own
DataFormats via embedding
(see Lecture 2.2, exercise 6)
Maximal
Configurability
• Add any extra info
you need to the
EventContent
WorkFlows
Selections
workflow tools
pat selectors
• Configure your workflow via
tools that PAT provides (see
Lecture 2.1/Exercise 05).
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
• Apply selections via the
StringCutParser
31/36
The Code Location
• DataFormats/PatCandidates
• Definition of all PAT Candidates.
• pat::Photon, pat::Electron, pat::Muon, pat::Tau, pat::Jet, pat::MET, …
• PhysicsTools/PatAlgos
• Implementation and filling of all data formats.
• Definition of common workflow and PAT tools
• PhysicsTools/PatUtils
• Definition of common tools and helper functions used in
PatAlgos
• PhysicsTools/PatExamples
• Location of many examples e.g. all non-trivial examples used during this
Tutorial
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
32/36
Development
PAT is part of any CMSSW release. We recommend to use it from the release!
Have a look at: SWGuidePATRecipes
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
33/36
Support
Check the the main entry page of PAT in the software guide: SWGuidePAT
A short extract of possible Support:
• Lecturers & Tutors
• Hypernews
• Community
• POG/PAG contacts
• Developers
• The quite developed PAT
• Documentation!
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
34/36
Documentation
• SWGuidePAT/WorkBookPAT Main documentation pages
• WorkBookPATDataFormats Description of all PAT Candidate
• WorkBookPATWorkflow Description of the PAT workflow
• WorkBookPATConfiguration Description of the configuration of PAT
• SWGuidePATTools Description of all PAT tools
• WorkBookPATTutorial Tutorials and examples to get started
• SWGuidePATRecipes Installation recipes
• SWGuidePATEventSize Tools for event size estimate
And last but not least: This Tutorial and/or former Tutorials...
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
35/36
Exercises
By now you should be prepared to do the following Exercises on
WorkBookPATTutorial: Have Fun!
Exercise 1 (WorkBookPATDocNavigationExercise)
The PAT Documentation is one of the most looked after parts of the WorkBook. To
know your documentation and how to use it can speed up your learning curve
enormously. Learn more about the PAT Documentation and how to make effective
use of it.
Exercise 2 (WorkBookTupleCreationExercise)
Learn how the default PAT tuple is produced
Exercise 3 (WorkBookTupleCrabExercise)
This is part of the crab tutorial. Once you are doing large scale analyses you might
need crab. For this tutorial you may skip this exercise.
malik@fnal.gov
PAT Tutorial @CERN – 4-8 April 2011
36/36
Download