What is PAT and how to use it Sudhir Malik Fermilab/University of Nebraska-Lincoln PAT Tutorial – CERN – 4-8 April 2011 malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 1/36 CMS Event Data Model (EDM) But before even PAT, briefly review CSMSSW • The CMS software CMSSW is based on ROOT • Pros: • Single File Structure • Reconstruction of detector object, physics objects (POGs) • Physics Analysis ( PAGs) • Track event from initial recording to “final” paper - Provenance • Challenge: • strong software requirements • flexible, easy to extend • Not be space expensive • More info can be found at WorkBookCMSSWFramework WorkBookAnalysisOverviewIntroduction malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 2/36 EDM Overview malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 3/36 The Event • Built up of several independent ROOT trees • one for each object class • Connected via 'SmartPointer' relations • edm::Ref, edm::Ptr, … • composed object with minimal space overhead malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 4/36 The Event • Built up of several independent ROOT trees (one for each object class). • Interconnected via 'SmartPointer' relations (edm::Ref, edm::Ptr, …) • Strong hierarchy: There is no direct connection between basic objects and super structures. 5/36 malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 The Event • Central structure: • No connection between basic objects themselves malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 6/36 Communication via the Event Content • Configurable Modules communicate via the Event • Extend/modify the event by alternative modules (e.g. fit algorithms): • Different modules ( e.g. different methods, models • different configuration of the same model • Event history tracked by provenance malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 7/36 Available EDModules write to Write a new object to the event Read from Discard n event based on retrieved info Read from Everything else based On retrieved info malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 8/36 The Event Content • Tiered data structure: RECO / AOD • View the event content as a complicated (grown) organic structure AOD Tier The data tier guarantees what information will be available RECO Tier • Different levels of granularity (resemble the order of creation) malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 9/36 The Event Content • Add/keep/drop trees to adapt content and size • some decisions have already have been taken for you • via the data tier definitions The minimum I want to know AOD Tier RECO Tier malik@fnal.gov The data tier guarantees what information will be available PAT Tutorial @CERN – 4-8 April 2011 10/36 FWLite: A light Version of EDM Sometimes people think the full EDM framework (FWFull) is a bit heavy…Even for those people CMS has something to offer: FWLite: This is bare Root with known data formats (with the same performance!) • Python configuration, edm::Handle, TFileService, data access equivalent to EDM • Also PAT is fully compliant with (and even especially supports) FWLite. • NO writing to the event content! • Full framework ↔ FWLite: This is NOT an exclusive OR! malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 11/36 AT Recommendation • Use cmsRun with crab or batch systems for large scale analyses • Use cmsRun to write persistent datasets to your disc space and for more complex analysis tasks • Use FWLite for testing/interactive analysis on a complexity level comparable with complex analysis tasks. A typical FWLite python configuration file: malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 12/36 OR Even Better: One to Rule'em All... • Write a bare C++ class, which derives from the BasicAnalyzer class: PhysicsTools/UtilAlgos/interface/BasicAnalyzer.h PhysicsTools/UtilAlgos/interface/: • EDAnalyzerWrapper.h • FWLiteAnalyzerWrapper.h • EDFilterWrapper.h • FWLiteFilterWrapper.h • AT provides Wrapper Classes to transform this class into an EDAnalyzer or a FWLiteAnalyzer executable ( see twiki WorkBookFWLiteExamples) • The same exists for EDFilters. • Check Exercise 4 for more details malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 13/36 Getting Started With an Analysis at CMS What do you want to do? What do you really(!) have to do to achieve this? Physics is is not • Selecting events. • Understanding data. • Understanding corrections. • Convincing others of the results • Writing notes and papers! malik@fnal.gov • Writing your own histogram plotting tool • Writing/maintaining your own ntupelizer • Convincing others that your variable definitions are PAT Tutorial @CERN – 4-8 April 2011 14/36 Guideline for s/w code used in CMS We even have an official document containing some guidelines for you: https://cms-physics.web.cern.ch/cms-physics/Analysis-Code-guidelines.pdf https://twiki.cern.ch/twiki/bin/view/CMS/Internal/Publications • Stick to event provenance! • Leave the EDM as late as possible (if at all) • Use official tools/code where possible. • Use PAT (as this includes already the first three points malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 15/36 Typical CMS Analysis Workflow • Prompt reconstruction at Tier-0. • Central skims at Tier-1's. • Users run cmsRun at Tier-2's: • Perform high level analysis steps. • Preselect events. • Write their own user defined • EventContent to private T2/T3 space • The latter step might be iterated. • Copy reduced datasets to your favorite machine. • Run your final analysis/produce plots malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 16/36 AT Recommendation • Use cmsRun with crab • Use cmsRun on batch systems or FWLite. • Write PAT Tuples • Write flat n-Tuples if you really think that you need them BUT don't write your own n-tupelize • Rather have at look at: SWGuideEDMNtuples PAT helps you create a user-defined EventContent malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 17/36 Difficulties with the EDM The EDM bares two main problems from the analysist's point of view: • Calculation/retrieval of high level analysis information (complicated pointer arithmetic! what do I need? Where do I find it?) • Reduction only to the relevant high level analysis information (where is the dropped data used throughout the event?) malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 18/36 Top 5 Analyst's Problems • I want use muons only. I dropped everything else from the event. Now I cannot access the 2/ndof? • I found out that the GsfElectron collection consists of many objects but only 10% can be real. There must be some electronID, how can I find and apply it? • I found out that CaloJets are uncorrected. How can I apply the JetMET calibration for the absolute JES to them? Do I need DB access for that? • I want to match reco objects to generated particles. I want to write my own matching. Is there an official recommendation for algorithms and parameters? • I want to match trigger muons to offline reconstructed muons. How do I do this? • Trouble shoot for this kind of problems can be found at WorkBookTroubleShooting malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 19/36 Framework Subtleties: Part 1 • I want use muons only. I dropped everything else from the event. Now I cannot access the 2/ndof? • You dropped the generalTracks collection from the event content, did not consider that reco::Muon internally points to the inner track points in generalTracks collection it was created from malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 20/36 Framework Subtleties: Part 2 • You found out how to use electronID via edm::Associations: • Any additional selection on the electrons will screw the synchronization (w/o error!) malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 21/36 What is the Physics Analysis Toolkit PAT is a toolkit that is an integral part of the CMSSW framework • It is an interface between the sometimes complicated EDM and the simple mind of the common user. • It serves as well tested and supported common ground for group and user analyses • It facilitates reproducibility and comprehensibility of analyses • You can view it as a common language between CMS analysts • If another CMS analyst describes you a PAT analysis you can easily know what he/she is talking about malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 22/36 Three aspects of PAT Interface Common Tool • b/w RECO expertise & Analysis Level contacts) • simplifies access via DataFormats • canalizes expertise (via POG & PAG • crossing point between POGs & PAGs ('vertical integration') • approved algorithms & sensible defaults • synergy (everybody can profit from recent developments) • quick start into analysis for the beginners Common Format • facilitates transfer & comparisons • PAG common configurations • sustained provenance malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 23/36 Facilitated Access to Event Information • Do you know how to access this event information within the EDM? • With PAT Candidates you get this just by calling member functions! • Note: Each PAT Candidate IS a corresponding reco::RecoCandidate (+ more) malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 24/36 The PAT Data Formats • All pat::Objects inherit from their corresponding reco::RecoCandidates • A PAT Candidate is a reco::RecoCandidate + more malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 25/36 PAT Candidate Member Functions Check the Documentation: WorkBookPATDataFormats malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 26/36 Combine Flexibility and User Friendliness • You can choose yourself whether you really need all the • extra information that the PAT Candidates provide • Still you don't need to know, how EDM/PAT manages this access for you under the hood Flexibility User Friendliness Maximal Configurability • The key is: configuration of DataFormats by cfi file! (e.g. for pat::Jets) malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 27/36 Configuration of PAT DataFormats You can configure the content of the DataFormats yourself (example: pat::Jet)! Size: 14 kb/event ( for ttbar) • This can be slimmed even further (exercise 6) malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 28/36 The PAT Workflow Have a look at: WorkBookPATWorkflow Pre-Production steps before PAT Candidate Creation PAT Candidate creation Main collection (w/o cleaning) Main collection (with cleaning) Resembled by the structure of the python directory in the PatAlgos package (check it out!) malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 29/36 EventContent of the default PAT Tuple • Have a look to patEventContent_cff.py: Size: 14kb/event (for ttbar) • Have a look to patTemplate_cfg.py: • But decide yourself how your PAT Tuple should look like (add reco::Tracks or reco::GenParticles to the Event Content or BTag information to the jets, etc ... ) malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 30/36 The concept of Maximal Configuration DataFormats EventContent object embedding pat event content • Configure your own DataFormats via embedding (see Lecture 2.2, exercise 6) Maximal Configurability • Add any extra info you need to the EventContent WorkFlows Selections workflow tools pat selectors • Configure your workflow via tools that PAT provides (see Lecture 2.1/Exercise 05). malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 • Apply selections via the StringCutParser 31/36 The Code Location • DataFormats/PatCandidates • Definition of all PAT Candidates. • pat::Photon, pat::Electron, pat::Muon, pat::Tau, pat::Jet, pat::MET, … • PhysicsTools/PatAlgos • Implementation and filling of all data formats. • Definition of common workflow and PAT tools • PhysicsTools/PatUtils • Definition of common tools and helper functions used in PatAlgos • PhysicsTools/PatExamples • Location of many examples e.g. all non-trivial examples used during this Tutorial malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 32/36 Development PAT is part of any CMSSW release. We recommend to use it from the release! Have a look at: SWGuidePATRecipes malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 33/36 Support Check the the main entry page of PAT in the software guide: SWGuidePAT A short extract of possible Support: • Lecturers & Tutors • Hypernews • Community • POG/PAG contacts • Developers • The quite developed PAT • Documentation! malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 34/36 Documentation • SWGuidePAT/WorkBookPAT Main documentation pages • WorkBookPATDataFormats Description of all PAT Candidate • WorkBookPATWorkflow Description of the PAT workflow • WorkBookPATConfiguration Description of the configuration of PAT • SWGuidePATTools Description of all PAT tools • WorkBookPATTutorial Tutorials and examples to get started • SWGuidePATRecipes Installation recipes • SWGuidePATEventSize Tools for event size estimate And last but not least: This Tutorial and/or former Tutorials... malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 35/36 Exercises By now you should be prepared to do the following Exercises on WorkBookPATTutorial: Have Fun! Exercise 1 (WorkBookPATDocNavigationExercise) The PAT Documentation is one of the most looked after parts of the WorkBook. To know your documentation and how to use it can speed up your learning curve enormously. Learn more about the PAT Documentation and how to make effective use of it. Exercise 2 (WorkBookTupleCreationExercise) Learn how the default PAT tuple is produced Exercise 3 (WorkBookTupleCrabExercise) This is part of the crab tutorial. Once you are doing large scale analyses you might need crab. For this tutorial you may skip this exercise. malik@fnal.gov PAT Tutorial @CERN – 4-8 April 2011 36/36