PowerPoint Presentation - Slide 1 - Indico

advertisement
Introduction to
CMS Software, Computing Model and Analysis Tools
Sudhir Malik
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
Outline
Three parts
• CMS Offline Software concepts and organization
• Computing Model
• Analysis Tools
• Remarks
CMSDAS pre-exercises
• Taste of a bit of each of the above
• Get you in CMSDAS-mode to understand physics objects and
perform physics analysis during the school
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
2
CMS Offline Software
concepts and organization
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
3
CMSSW
• The full suite of CMS offline software is called CMSSW
• Based on Event Data Model (EDM)
• Used for online data taking, simulation, primary reconstruction, and physics analysis
• The two thousand (or so) packages that make up the project are organized into subsystems with names that should be suggestive of their purpose
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
4
Documentation
• WorkBook - https://twiki.cern.ch/twiki/bin/view/CMS/WorkBook
• source of one-stop shopping organization, update frequently
• initial starting point for people new to CMS software and computing
• intended for people to read through it, and work out the
examples and tutorials it contains
• CMS tutorials are found here
• SWGuide - https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuide
• “industrial” strength shopping needs, expert level information
• The software architecture, detailed descriptions of the algorithms, instructions
for analysis and validation
• Not updated as frequently
• Reference Manual - http://cmssdt.cern.ch/SDT/doxygen/
•Reference Manual - mixture of auto-generated, Doxygen and hand written
pages, always check the last edit date…
• Glossary of acronyms - https://twiki.cern.ch/twiki/bin/view/CMS/WorkBookGlossary
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
5
CMS Event Data Model (EDM)
• A CMS Event* starts as a collection of the RAW data from a detector or MC event
• As the event data is processed, products are stored in the Event as reconstructed
(RECO) data objects
• Event contains = triggered physics event + derived data + metadata(software config) +
condition and calibration data
• Products are stored as C++ containers
* single readout of the detector electronics and the signals that will (in general) have been generated by particles, tracks, energy deposits, present in a number
of bunch crossings
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
6
Modular architecture of FW
A Software Bus Model
• Start from the raw or generated data, read from an input root file
• Producers are scheduled to operate on the event data and produce their output which
is written into the event
• Because it’s modular you can inspect/debug the job at any point in the processing
path. The contents of the event can be examined outside of the context of the process
that made it.
• The FW automatically tracks the provenance of what is produced.
• Several instances of the same module can be run in the same application and you will
still be able to uniquely identify their products, eg. JetProducer with different cone
sizes. Identified by C++ type, producer label, instance name, process name
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
7
Concept of Provenance
Why should you care?
Provenance helps answer the question “how was this root file made?”, and “Why
does my plot look different then Joe’s,” or simply “How is this file made?”
All “tracked” parameters values used in the job that created the file are recorded
in the file.
• Any result should be reproducible starting from
• the input data file,
• the plugins, and
• the output provenance
Another source of provenance is the “Dataset Aggregation Service”,DAS, which
stores the top level configuration for each file.
Others are: The DB global tag combined with ORCOFF, CVS, cmsTC
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
8
Framework modules
Modules are configurable and communicate via the Event
• Source
• Reads the event in a root file
• OutputModule
• Write events to a file. Can use filter decisions
• EDAnalyzer (read)
• Reading data only
• Creating histograms
• the standard use case
• EDProducer (read/write)
• You want to create new products
• You want to share your reconstruction code with others
• EDFilter (read/write)
• you want to know if an object could be produced
• you want to control the analysis flow or make skimming
• Non-Event Data - Geometry, Calibration, MessageLogger –
• ESSource, ESProducer
• More info can be found at
• https://twiki.cern.ch/twiki/bin/view/CMS/WorkBookCMSSWFramework
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
9
What is stored in the event files?
In CMSSW a set of standard data formats is defined, they are
collections of several products managed centrally in CMSSW
• RAW
•Data like they come from the detector
• RECO (Reconstruction):
•Output of the event reconstruction
• AOD (Analysis Object Data):
• Subset of data needed for standard analysis
• RAWSIM, RECOSIM, AODSIM:
• with additional simulation information
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
10
Input - Source, Output - OutputModule
process.source = cms.Source("PoolSource”,
fileNames = cms.untracked.vstring("file:input.root”)
)
INPUT
process.out = cms.OutputModule("PoolOutputModule",
fileName = cms.untracked.string("output.root")
)
Process.out.outputCommands = cms.untracked.vstring(
"keep *",
"drop *_*_*_HLT",
"keep FEDRawDataCollection_*_*_*”
)
malik@fnal.gov
OUTPUT
Configurable
Event Content:
What to drop?
What to keep?
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
11
Inspect configuration: the ConfigBrowser
You can inspect your config file using a graphical tool as well: the ConfigBrowser
Tree View
Graphical
representation
Property View
Box
https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideConfigBrowser
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
12
What are the stored products?
edmDumpEventContent <filename>
C++ class type
product alias
label
process name
vector<reco::MET>
"tcMet"
""
"RECO."
vector<reco::Muon>
"muons"
""
"RECO."
vector<reco::Muon>
"muonsFromCosmics" ""
"RECO."
vector<reco::Muon>
"muonsFromCosmics1Leg" ""
"RECO."
vector<reco::PFCandidate> "particleFlow"
""
"RECO."
vector<reco::PFCandidate> "particleFlow"
"electrons” "RECO."
vector<reco::PFJet>
"ak5PFJets"
""
"RECO."
Handle<reco::MuonCollection> muons;
Event.getByLabel(”muons”,muons );
Access the single
Product in the
framework module
malik@fnal.gov
reco::MuonCollection is a typedef
for vector<reco::Muon>
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
13
Accessing Event Data
We can access the products
in the module using the Handle
# by module and default product label
Handle<reco::MuonCollection> muons;
iEvent.getByLabel(”muons", muons );
# by module and product label
Handle<vector<reco::PFCandidate> > particleFlow;
iEvent.getByLabel("particleFlow", ”electrons" , particleFlow_electrons );
Framework modules are written in C++ , you can find a
basic C++ guide at:
https://twiki.cern.ch/twiki/bin/view/CMS/WorkBookBasicCPlusPlus
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
14
CMSSW Release
CMSSW_4_2_X - Was the release used for prompt reconstruction for the majority of 2011. It struggled
to process 2011B data with pileup of 16 or more. CMSSW_4_2_8_patch7 is the currently
recommended analysis release for data and MC reconstructed with 4.2.x.
CMSSW_4_4_X - Fixed all of the technical performance problems, while keeping the same physics
performance as 4_2_X. It was used to reprocess all of the 2011 data and MC at the end of data taking
and for the prompt reconstruction of the 2011 HI run. CMSSW_4_4_4 is the currently recommended
analysis release for data and MC reconstructed with 4.4.x.
CMSSW_5_2_X - is the high pileup data taking release for 2012. CMSSW_5_2_5 is the currently
recommended Analysis Release for 2012 Data/MC.
CMSSW_6_1_X - is the integration release for the software being developed for the upgraded detector.
https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookWhichRelease
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
15
CMS Computing model
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
16
Computing and Analysis Facilities
Analysis in CMS is performed on a globally distributed collection of computing facilities
T3
T3
T3
Tier 0 (T0) at CERN (20% of all CMS computing resources)
T3
T3
T3
T3
T3
T3
T3
T3
T3
• Primary reconstruction
• First archival copy of the data
• Only central processing, no user access
Tier 1 (T1) - each in 7 countries
(40% of all CMS computing resources)
• Share of raw data for custodial storage
• Data reporocessing
• Skim data to reduce size make it more easily handleable
• Data selection
• Archive simulated data from Tier 2, no user access
Tier 2 (T2) ~ 50 Tier2s
• Half devoted MC production, half user analysis
• Physics Group data storage
• Primary Resource for Analysis
T3
T3
T3
T3
T3
T3
T3
T3
T3
T3
T3
T3
T3
T3
T3
T3
T3
T3
T3
T3
T3
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
T3
malik@fnal.gov
T3
• Entirely controlled by the providing institution
• Used for analysis
T3
Tier 3 (T3)
17
US Tier sites (example)
The US has a Tier-1 Center at FNAL
• FNAL is the largest Tier-1 center in CMS
The US has 7 Tier-2 computing facilities
• Caltech, Florida, MIT, Nebraska, Purdue, UCSD, Wisconsin
• All have at least 300 TB of disk
In addition FNAL has an analysis farm called the LPC CAF
• Like a very large Tier-3 in terms of analysis
• Not grid accessible
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
18
Data Storage
Tier-2s
• In CMS jobs go to the data.
• The challenging part is making sure the right data is distributed
broadly
•30TB is identified for use by the local group
• Local community controlled space
20TB is identified for storing user produced files and making them grid accessible
Tier-3s
• Important analysis resource, entirely controlled by the community that provides
them, minimum to be a registered Tier-3 is a PhEDEx
instance for data Transfers
• Currently over 50 Tier-3s registered, 22 in the US
• A number of Tier-3s have a full grid interface and accept analysis jobs
• Currently the Tier-3s can receive data from anywhere large variations in sizes
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
19
Data Names/Management/Movement
• Dataset convention is: /PrimaryDataset/AcquisitionEra-ProcVersion/DataTier
• PrimaryDataset: Physics channel based on triggers
• AcquisitionEra: A period of data taking (BeamCommisioning09)
• ProcVersion: Describes processing conditions, _v1, _v2
• DataTier: What objects are in the dataset? RAW, RECO, AOD
• CMS data is divided into
• Datasets - A group of events that can be accessed together
• Data Blocks - A group of files that is tracked by the data transfer system
• Logical file name (LFN) – A single file in a block, uniform name
• Physical file name (PFN) – Actual file name on a site
• Search data - Dataset Aggregation Service (DAS) page - https://cmsweb.cern.ch/das/
• Find data (talk to your physics group which data is relevant for your analysis)
• Wildcard searches, select on software releases
• Data is also divided by Event Content (Tier)
• RAW, RECO, AOD , may be skimmed and reduced
• Data Movement between Tiers in CMS is handled by a tool called PhEDEx
• http://cmsdoc.cern.ch/cms/aprom/phedex/prod/Activity::RatePlots?view=globa
• Any user can request data transfer, site managers approve the request
• For fetching a file or two, one can use Filemover
• https://cmsweb.cern.ch/filemover, useful to develop analysis code, debug etc.,
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
20
CMS, Grid and CRAB
• The GRID
• Interconnects world-wide computing resources (batch farms, storagesystems, ... ) in a
transparent way
•Provides every user with access to these resources
•Authorization, job submission, ...
• CMS uses the GRID to provide data access for all collaborators.
• CMS uses GRID certificates and a dedicated Virtual Organization (VO) management to have
better access control for specific tasks/groups
• You need a certificate for making Phedex requests, submitting jobs, and can use to access
Indico and the TWiki
• Your GRID certificate is important, follow all rules, don’t let it expire!
• CRAB enables the user to submit CMSSW jobs to all CMS datasets within the datalocation driven CMS computing structure
• Aims to hide as much of the complexity of the GRID as possible from the end user
• Provides a user front-end to
• Create and manage proxies from your certificate
• Split user jobs into manageable pieces
• Transport user analysis code to the data location for execution (compiled on submitting node)
• Execute user jobs, check status and retrieve output
• CRAB Support – https://hypernews.cern.ch/HyperNews/CMS/get/crabFeedback
• CRAB documentation – SWGuideCrab, SWGuideCrabFaq
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
21
Analysis Tools and Physics Analysis Toolkit
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
22
Analysis Workflow
Development
Comprehension
• Configuration tools
• Distribution kit
Seek to Minimize steps in the loop
Analysis Tools are there to help you
Execution
malik@fnal.gov
• Full Framework
• FWLite
• Grid
• PAT
• Fireworks
• FWLite
• Validated Code
 PAT
Selectors
POG/PAG Software
•Validate physics results
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
23
Getting Started With an Analysis
What do you want to do? What do you really(!) have to
do to achieve this?
Physics
is
is not
• Selecting events.
• Understanding data.
• Understanding corrections.
• Convincing others of the results
• Writing notes and papers!
malik@fnal.gov
• Writing your own histogram plotting tool
• Writing/maintaining your own ntupelizer
• Convincing others about your variable
definitions
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
24
Analysis in CMSSW
• Typical case – someone hands
you code, you select, modify,
use it
• If you are an experienced HEP
• person, you are used to TTree
analyzer
• At CMS we have the same
concept, not more difficult
• We have CMS specific way of
reading and writing Ttrees
• same semantics, different syntax
CMSSW is exactly what you are used to, only geared towards CMS
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
25
Guideline for s/w code used in CMS
We even have an official document containing some guidelines for
you:
https://cms-physics.web.cern.ch/cms-physics/Analysis-Code-guidelines.pdf
https://twiki.cern.ch/twiki/bin/view/CMS/Internal/Publications
• Stick to event provenance!
• Leave the EDM as late as possible (if at all)
• Use official tools/code where possible.
• Use PAT (as this includes already the first three points)
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
26
The Event
• Each Event data file is made of several independent ROOT trees
• one for each object class
• Can be inspected with ROOT
root -l
[] TFile
f(“AOD.root”)
[] new TBrowser()
• Data inside the event are called “Product”
• moduleLabel : productInstanceLabel : processName
• Example: recoTracks_generalTracks_RECO
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
27
The TTree Objects
• Connected via 'SmartPointer' relations
• edm::Ref, edm::Ptr, …
• Strong hierarchy
• There is no direct connection between basic objects and super structures
• Composed object with minimal space overhead
• Central structure:
• No connection between basic objects themselves
• TTrees have provenance
• can tell what happened to the file without having access to configuration file
• edmProvDump and edmProvDiff can save you days, weeks, months!!
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
28
Events can be visualized
In CMS there are visualization tools, one of them is:
Fireworks is the light weight event display for analysis. It
can be installed on your laptop. You can find it at:
https://twiki.cern.ch/twiki/bin/view/CMS/WorkBookFireworks
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
29
The Event Content
• Tiered data structure: RECO / AOD
• View the event content as a complicated (grown) organic structure
• Add/keep/drop trees to adapt content and size
• some decisions have already have been taken for you
• via the data tier definitions
The minimum I want to know
AOD Tier
The data tier guarantees what
information will be available
RECO Tier
• Different levels of granularity (resemble the order of creation)
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
30
FWLite: A light Version of EDM
Sometimes people think the full EDM framework (FWFull) is a bit heavy…Even for
those people CMS has something to offer:
Typical FWLite config file
FWLite: This is bare Root with
known data formats
(with the same performance!)
• Python configuration, edm::Handle, TFileService, data access equivalent to
EDM
• Also PAT is fully compliant with (and even especially supports) FWLite.
• NO writing to the event content!
• Full framework ↔ FWLite: This is NOT an exclusive OR!
• Have a look at: WorkBookFWLite
• AT provides BasicAnalyzer class and a wrapper Classes to transform this class
into an EDAnalyzer or a FWLiteAnalyzer executable ( see twiki WorkBookFWLiteExamples)
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
31
Typical CMS Analysis Workflow
• Prompt reconstruction at Tier-0.
• Central skims at Tier-1's.
• Users run cmsRun at Tier-2's:
• Perform high level analysis steps.
• Preselect events.
• Write their own user defined
• EventContent to private T2/T3 space
• The latter step might be iterated.
• Copy reduced datasets to your favorite
machine.
• Run your final analysis/produce plots
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
32
AT Recommendation
• Use cmsRun with crab
• Use cmsRun on batch systems or FWLite
• Use cmsRun to write persistent datasets to your
disc space and for more complex analysis tasks
• Write PAT Tuples
• Use FWLite for testing/interactive analysis on a
complexity level comparable with complex
analysis tasks
• Write flat n-Tuples if you really think that you
need them. Use EDM-ntuples instead of
bare root TTrees!
• Rather have at look at:
SWGuideEDMNtuples
PAT helps you create a user-defined
EventContent
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
33
Difficulties with the EDM
The EDM bares two main problems from the
analysist's point of view:
• Calculation/retrieval of high level analysis information
(complicated pointer arithmetic! what do I need? Where
do I find it?)
• Reduction only to the relevant high level analysis
information (where is the dropped data used throughout
the event?)
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
34
Analyst's Issues
• I want use muons only. I dropped everything else from the event. Now I cannot
access the 2/ndof?
• I found out that the GsfElectron collection consists of many objects but only 10%
can be real. There must be some electronID, how can I find and apply it?
• I found out that CaloJets are uncorrected. How can I apply the JetMET calibration
for the absolute JES to them? Do I need DB access for that?
• I want to match reco objects to generated particles. I want to write my own
matching. Is there an official recommendation for algorithms and parameters?
• I want to match trigger muons to offline reconstructed muons. How do I do this?
• How to reduce the event content?
• Skim events (keep dijects with pt > 50?), dropping objects (no muons,, keep jets),
reducing object size (keep 4-vector, drop the rest)
PAT comes in here and help you
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
35
What is the Physics Analysis Toolkit
PAT is a toolkit that is an integral part of the CMSSW framework
• It is an interface between the sometimes complicated EDM and the simple
mind of the common user.
• It serves as well tested and supported common ground for group and user
analyses
• It facilitates reproducibility and comprehensibility of analyses
• You can view it as a common language between CMS analysts
• If another CMS analyst describes you a PAT analysis you can easily know
what he/she is talking about
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
36
Three aspects of PAT
Interface
Common Tool
• b/w RECO expertise & Analysis Level
contacts
• simplifies access via DataFormats
• canalizes expertise (via POG &
PAG
• crossing point between POGs & PAGs
('vertical integration')
• approved algorithms & sensible defaults
• synergy (everybody can profit from
recent developments)
• quick start into analysis for the beginners
Common Format
• facilitates transfer & comparisons
• PAG common configurations
• sustained provenance
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
37
The PAT Data Formats
• All pat::Objects inherit from their corresponding
reco::RecoCandidates
• A PAT Candidate is a reco::RecoCandidate + more
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
38
Combine Flexibility and User Friendliness
• You can choose yourself whether you really need all the
extra information that the PAT Candidates provide
• Still you don't need to know, how EDM/PAT manages this
access for you under the hood
Flexibility
User Friendliness
Maximal
Configurability
• The key is: configuration of DataFormats by cfi file! (e.g. for pat::Jets)
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
39
Configuration of PAT DataFormats
You can configure the content of the DataFormats yourself (example: pat::Jet)!
Size: 14 kb/event ( for ttbar)
• This can be slimmed even further
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
40
The concept of Maximal Configuration
DataFormats
EventContent
object embedding
pat event content
• Configure your own
DataFormats via embedding
• Add any extra info
you need to the
EventContent
Maximal
Configurability
WorkFlows
Selections
workflow tools
pat selectors
• Configure your workflow via
tools that PAT provides
malik@fnal.gov
• Apply selections via the
StringCutParser
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
41
Particle Flow
• Particle flow (PF) provides a full unambiguous reconstructed particle
based event interpretation, more sophisticated ways to do objection
disambiguation
• PF is plugged in PAT as follows and is called PF2PAT
•
SWGuidePF2PAT/WorkBookPF2PAT
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
42
PF2PAT
• Can run PF2PAT and standard PAT in parallel
• Example, standard PAT muons will still be stored in
patMuons and the muons found by PF2PAT will be stored in
patMuonsPOSTFIX
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
43
The Code Location
• DataFormats/PatCandidates
• Definition of all PAT Candidates.
• pat::Photon, pat::Electron, pat::Muon, pat::Tau, pat::Jet, pat::MET, …
• PhysicsTools/PatAlgos
• Implementation and filling of all data formats.
• Definition of common workflow and PAT tools
• PhysicsTools/PatUtils
• Definition of common tools and helper functions used in
PatAlgos
• PhysicsTools/PatExamples
• Location of many examples e.g. all non-trivial examples used during this
Tutorial
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
44
Development
Have a look at: SWGuidePATRecipes
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
45
Support for AT
Check the the main entry page of PAT in the software guide: SWGuidePA
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
46
PAT Documentation
• SWGuidePAT/WorkBookPAT Main documentation pages
• WorkBookPATDataFormats Description of all PAT Candidate
• WorkBookPATWorkflow Description of the PAT workflow
• WorkBookPATConfiguration Description of the configuration of PAT
• SWGuidePATTools Description of all PAT tools
• WorkBookPATTutorial Tutorials and examples to get started
• SWGuidePATRecipes Installation recipes
• SWGuidePATEventSize Tools for event size estimate
• SWGuidePF2PAT/WorkBookPF2PAT Main documentation PF2PAT
• WorkBookPATTutorial
malik@fnal.gov
PAT Tutorial
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
47
Concluding remarks
• You have experienced a pre-view of these tools, concepts doing
CMSDAS pre-exercises
• Use the above to understand the physics objects and do physics
analysis during this school
• SHORT and LONG exercises
• Participate in:
• PAT tutorials to know about PAT and PF2PAT
• CRAB, Fireworks, DAS tutorials etc.
• CMS Data Analysis School(s)
• Contribute to physics tools and tutorials – service credit given
• https://twiki.cern.ch/twiki/bin/view/CMS/Tutorials
• hn-cms-physTools@cern.ch (analysis tools releated – PAT, PF2PAT,tag
and probe etc.)
• hn-cms-crabFeedback@cern.ch (crab related)
• Analysis tools meetings (announced in the hn above, link to all, below)
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
• https://twiki.cern.ch/twiki/bin/view/CMS/AnalysisToolsMeetings
48
BACKUP
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
49
CMS Event Data Model (EDM) II
• The CMS software CMSSW is based on ROOT
• All CMS EDM output files, from RAW data files just from the HLT to analysis
PATtuples are proper ROOT files
• System is designed so that the same root file can be used in three contexts:
• cmsRun - input for a full framework application, could be used for a reprocessing pass or creation
of more refined analysis objects, like PAT objects.
• FWLite – a small set of CMS defined loadable libraries to allow more sophisticated use with root,
for the EDM this is a read-only application.
• Bare root.exe –allows browsing of simple objects (floats, ints, and composites of them) from
the root TBrowser GUI is the ONLY recommended use case
• Pros
• Single File Structure, Reconstruction of detector object, physics objects (POGs)
• Physics Analysis ( PAGs)
• Track event from initial recording to “final” paper - Provenance
• Challenge
• strong software requirements, flexible, easy to extend, not be space expensive
• More info can be found at
WorkBookCMSSWFramework
WorkBookAnalysisOverviewIntroduction
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
50
Concept of Framework (FW)
• Applications in CMSSW are built from special shared object libraries
called plugins. In practice this means:
• cmsRun <some-configuration-file>
• Configurations are written in the Python language.
• There are two types of plugins:
• Module Plugins
• EDAnalyzers: e.g., making histograms
• EDProducers: e.g., making tracks from hits and saving tracks in event
• EDFilters: e.g., Skimming - Deciding or not to keep a given event
• Others: ESSources, ESProducers.
• ED* process event data, ES* process event setup data
• Data Object Plugins – also known as “root dictionaries” as they can also be
loaded directly into the root application. These are most of the products of the
above work, and form the elements of the EDM
• cmsRun workflows can mix standard plugins with user defined ones
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
51
Workflow
MC Workflow
Scientific analysis, simulation requires the
processing and generation of millions events,
calibration, skims, express analyses
Collision data
The set of processes that both MC and collision data flow through
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
52
Integration Build (IB)
IB - Tests that CMSSW code compiles, standard jobs run without errors - do some code quality and run quality tests
cmsDriver.py is a utility used to generate a full configuration file, which can be run with cmsRun for the above tests and
production quality configuration files. If you generate MC events you will come across it too (WorkBookGeneration)
http://cmssdt.cern.ch/SDT/html/showIB.html
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
53
Framework Subtleties: Example
• I want to use muons only. I dropped everything else from the event.
Now I cannot access the 2/ndof?
• You dropped the generalTracks collection from the event content,
did not consider that reco::Muon internally points to the
inner track points in generalTracks collection it was created from
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
54
Facilitated Access to Event Information
• Do you know how to access this event information within the EDM?
• With PAT Candidates you get this just by calling member functions!
• Note:
Each PAT Candidate IS a corresponding reco::RecoCandidate (+ more)
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
55
EventContent of the default PAT Tuple
Size: 14kb/event (for ttbar)
• A default patTemplate_cfg.py:
• But decide yourself how your PAT Tuple should look like (add reco::Tracks or
reco::GenParticles to the Event Content or BTag information to the jets, etc ... )
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
56
PAT Candidate Member Functions
Check the Documentation: WorkBookPATDataFormats
malik@fnal.gov
CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012
57
Download