Introduction to CMS Software, Computing Model and Analysis Tools Sudhir Malik malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 Outline Three parts • CMS Offline Software concepts and organization • Computing Model • Analysis Tools • Remarks CMSDAS pre-exercises • Taste of a bit of each of the above • Get you in CMSDAS-mode to understand physics objects and perform physics analysis during the school malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 2 CMS Offline Software concepts and organization malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 3 CMSSW • The full suite of CMS offline software is called CMSSW • Based on Event Data Model (EDM) • Used for online data taking, simulation, primary reconstruction, and physics analysis • The two thousand (or so) packages that make up the project are organized into subsystems with names that should be suggestive of their purpose malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 4 Documentation • WorkBook - https://twiki.cern.ch/twiki/bin/view/CMS/WorkBook • source of one-stop shopping organization, update frequently • initial starting point for people new to CMS software and computing • intended for people to read through it, and work out the examples and tutorials it contains • CMS tutorials are found here • SWGuide - https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuide • “industrial” strength shopping needs, expert level information • The software architecture, detailed descriptions of the algorithms, instructions for analysis and validation • Not updated as frequently • Reference Manual - http://cmssdt.cern.ch/SDT/doxygen/ •Reference Manual - mixture of auto-generated, Doxygen and hand written pages, always check the last edit date… • Glossary of acronyms - https://twiki.cern.ch/twiki/bin/view/CMS/WorkBookGlossary malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 5 CMS Event Data Model (EDM) • A CMS Event* starts as a collection of the RAW data from a detector or MC event • As the event data is processed, products are stored in the Event as reconstructed (RECO) data objects • Event contains = triggered physics event + derived data + metadata(software config) + condition and calibration data • Products are stored as C++ containers * single readout of the detector electronics and the signals that will (in general) have been generated by particles, tracks, energy deposits, present in a number of bunch crossings malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 6 Modular architecture of FW A Software Bus Model • Start from the raw or generated data, read from an input root file • Producers are scheduled to operate on the event data and produce their output which is written into the event • Because it’s modular you can inspect/debug the job at any point in the processing path. The contents of the event can be examined outside of the context of the process that made it. • The FW automatically tracks the provenance of what is produced. • Several instances of the same module can be run in the same application and you will still be able to uniquely identify their products, eg. JetProducer with different cone sizes. Identified by C++ type, producer label, instance name, process name malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 7 Concept of Provenance Why should you care? Provenance helps answer the question “how was this root file made?”, and “Why does my plot look different then Joe’s,” or simply “How is this file made?” All “tracked” parameters values used in the job that created the file are recorded in the file. • Any result should be reproducible starting from • the input data file, • the plugins, and • the output provenance Another source of provenance is the “Dataset Aggregation Service”,DAS, which stores the top level configuration for each file. Others are: The DB global tag combined with ORCOFF, CVS, cmsTC malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 8 Framework modules Modules are configurable and communicate via the Event • Source • Reads the event in a root file • OutputModule • Write events to a file. Can use filter decisions • EDAnalyzer (read) • Reading data only • Creating histograms • the standard use case • EDProducer (read/write) • You want to create new products • You want to share your reconstruction code with others • EDFilter (read/write) • you want to know if an object could be produced • you want to control the analysis flow or make skimming • Non-Event Data - Geometry, Calibration, MessageLogger – • ESSource, ESProducer • More info can be found at • https://twiki.cern.ch/twiki/bin/view/CMS/WorkBookCMSSWFramework malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 9 What is stored in the event files? In CMSSW a set of standard data formats is defined, they are collections of several products managed centrally in CMSSW • RAW •Data like they come from the detector • RECO (Reconstruction): •Output of the event reconstruction • AOD (Analysis Object Data): • Subset of data needed for standard analysis • RAWSIM, RECOSIM, AODSIM: • with additional simulation information malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 10 Input - Source, Output - OutputModule process.source = cms.Source("PoolSource”, fileNames = cms.untracked.vstring("file:input.root”) ) INPUT process.out = cms.OutputModule("PoolOutputModule", fileName = cms.untracked.string("output.root") ) Process.out.outputCommands = cms.untracked.vstring( "keep *", "drop *_*_*_HLT", "keep FEDRawDataCollection_*_*_*” ) malik@fnal.gov OUTPUT Configurable Event Content: What to drop? What to keep? CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 11 Inspect configuration: the ConfigBrowser You can inspect your config file using a graphical tool as well: the ConfigBrowser Tree View Graphical representation Property View Box https://twiki.cern.ch/twiki/bin/view/CMS/SWGuideConfigBrowser malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 12 What are the stored products? edmDumpEventContent <filename> C++ class type product alias label process name vector<reco::MET> "tcMet" "" "RECO." vector<reco::Muon> "muons" "" "RECO." vector<reco::Muon> "muonsFromCosmics" "" "RECO." vector<reco::Muon> "muonsFromCosmics1Leg" "" "RECO." vector<reco::PFCandidate> "particleFlow" "" "RECO." vector<reco::PFCandidate> "particleFlow" "electrons” "RECO." vector<reco::PFJet> "ak5PFJets" "" "RECO." Handle<reco::MuonCollection> muons; Event.getByLabel(”muons”,muons ); Access the single Product in the framework module malik@fnal.gov reco::MuonCollection is a typedef for vector<reco::Muon> CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 13 Accessing Event Data We can access the products in the module using the Handle # by module and default product label Handle<reco::MuonCollection> muons; iEvent.getByLabel(”muons", muons ); # by module and product label Handle<vector<reco::PFCandidate> > particleFlow; iEvent.getByLabel("particleFlow", ”electrons" , particleFlow_electrons ); Framework modules are written in C++ , you can find a basic C++ guide at: https://twiki.cern.ch/twiki/bin/view/CMS/WorkBookBasicCPlusPlus malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 14 CMSSW Release CMSSW_4_2_X - Was the release used for prompt reconstruction for the majority of 2011. It struggled to process 2011B data with pileup of 16 or more. CMSSW_4_2_8_patch7 is the currently recommended analysis release for data and MC reconstructed with 4.2.x. CMSSW_4_4_X - Fixed all of the technical performance problems, while keeping the same physics performance as 4_2_X. It was used to reprocess all of the 2011 data and MC at the end of data taking and for the prompt reconstruction of the 2011 HI run. CMSSW_4_4_4 is the currently recommended analysis release for data and MC reconstructed with 4.4.x. CMSSW_5_2_X - is the high pileup data taking release for 2012. CMSSW_5_2_5 is the currently recommended Analysis Release for 2012 Data/MC. CMSSW_6_1_X - is the integration release for the software being developed for the upgraded detector. https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookWhichRelease malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 15 CMS Computing model malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 16 Computing and Analysis Facilities Analysis in CMS is performed on a globally distributed collection of computing facilities T3 T3 T3 Tier 0 (T0) at CERN (20% of all CMS computing resources) T3 T3 T3 T3 T3 T3 T3 T3 T3 • Primary reconstruction • First archival copy of the data • Only central processing, no user access Tier 1 (T1) - each in 7 countries (40% of all CMS computing resources) • Share of raw data for custodial storage • Data reporocessing • Skim data to reduce size make it more easily handleable • Data selection • Archive simulated data from Tier 2, no user access Tier 2 (T2) ~ 50 Tier2s • Half devoted MC production, half user analysis • Physics Group data storage • Primary Resource for Analysis T3 T3 T3 T3 T3 T3 T3 T3 T3 T3 T3 T3 T3 T3 T3 T3 T3 T3 T3 T3 T3 CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 T3 malik@fnal.gov T3 • Entirely controlled by the providing institution • Used for analysis T3 Tier 3 (T3) 17 US Tier sites (example) The US has a Tier-1 Center at FNAL • FNAL is the largest Tier-1 center in CMS The US has 7 Tier-2 computing facilities • Caltech, Florida, MIT, Nebraska, Purdue, UCSD, Wisconsin • All have at least 300 TB of disk In addition FNAL has an analysis farm called the LPC CAF • Like a very large Tier-3 in terms of analysis • Not grid accessible malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 18 Data Storage Tier-2s • In CMS jobs go to the data. • The challenging part is making sure the right data is distributed broadly •30TB is identified for use by the local group • Local community controlled space 20TB is identified for storing user produced files and making them grid accessible Tier-3s • Important analysis resource, entirely controlled by the community that provides them, minimum to be a registered Tier-3 is a PhEDEx instance for data Transfers • Currently over 50 Tier-3s registered, 22 in the US • A number of Tier-3s have a full grid interface and accept analysis jobs • Currently the Tier-3s can receive data from anywhere large variations in sizes malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 19 Data Names/Management/Movement • Dataset convention is: /PrimaryDataset/AcquisitionEra-ProcVersion/DataTier • PrimaryDataset: Physics channel based on triggers • AcquisitionEra: A period of data taking (BeamCommisioning09) • ProcVersion: Describes processing conditions, _v1, _v2 • DataTier: What objects are in the dataset? RAW, RECO, AOD • CMS data is divided into • Datasets - A group of events that can be accessed together • Data Blocks - A group of files that is tracked by the data transfer system • Logical file name (LFN) – A single file in a block, uniform name • Physical file name (PFN) – Actual file name on a site • Search data - Dataset Aggregation Service (DAS) page - https://cmsweb.cern.ch/das/ • Find data (talk to your physics group which data is relevant for your analysis) • Wildcard searches, select on software releases • Data is also divided by Event Content (Tier) • RAW, RECO, AOD , may be skimmed and reduced • Data Movement between Tiers in CMS is handled by a tool called PhEDEx • http://cmsdoc.cern.ch/cms/aprom/phedex/prod/Activity::RatePlots?view=globa • Any user can request data transfer, site managers approve the request • For fetching a file or two, one can use Filemover • https://cmsweb.cern.ch/filemover, useful to develop analysis code, debug etc., malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 20 CMS, Grid and CRAB • The GRID • Interconnects world-wide computing resources (batch farms, storagesystems, ... ) in a transparent way •Provides every user with access to these resources •Authorization, job submission, ... • CMS uses the GRID to provide data access for all collaborators. • CMS uses GRID certificates and a dedicated Virtual Organization (VO) management to have better access control for specific tasks/groups • You need a certificate for making Phedex requests, submitting jobs, and can use to access Indico and the TWiki • Your GRID certificate is important, follow all rules, don’t let it expire! • CRAB enables the user to submit CMSSW jobs to all CMS datasets within the datalocation driven CMS computing structure • Aims to hide as much of the complexity of the GRID as possible from the end user • Provides a user front-end to • Create and manage proxies from your certificate • Split user jobs into manageable pieces • Transport user analysis code to the data location for execution (compiled on submitting node) • Execute user jobs, check status and retrieve output • CRAB Support – https://hypernews.cern.ch/HyperNews/CMS/get/crabFeedback • CRAB documentation – SWGuideCrab, SWGuideCrabFaq malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 21 Analysis Tools and Physics Analysis Toolkit malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 22 Analysis Workflow Development Comprehension • Configuration tools • Distribution kit Seek to Minimize steps in the loop Analysis Tools are there to help you Execution malik@fnal.gov • Full Framework • FWLite • Grid • PAT • Fireworks • FWLite • Validated Code PAT Selectors POG/PAG Software •Validate physics results CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 23 Getting Started With an Analysis What do you want to do? What do you really(!) have to do to achieve this? Physics is is not • Selecting events. • Understanding data. • Understanding corrections. • Convincing others of the results • Writing notes and papers! malik@fnal.gov • Writing your own histogram plotting tool • Writing/maintaining your own ntupelizer • Convincing others about your variable definitions CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 24 Analysis in CMSSW • Typical case – someone hands you code, you select, modify, use it • If you are an experienced HEP • person, you are used to TTree analyzer • At CMS we have the same concept, not more difficult • We have CMS specific way of reading and writing Ttrees • same semantics, different syntax CMSSW is exactly what you are used to, only geared towards CMS malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 25 Guideline for s/w code used in CMS We even have an official document containing some guidelines for you: https://cms-physics.web.cern.ch/cms-physics/Analysis-Code-guidelines.pdf https://twiki.cern.ch/twiki/bin/view/CMS/Internal/Publications • Stick to event provenance! • Leave the EDM as late as possible (if at all) • Use official tools/code where possible. • Use PAT (as this includes already the first three points) malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 26 The Event • Each Event data file is made of several independent ROOT trees • one for each object class • Can be inspected with ROOT root -l [] TFile f(“AOD.root”) [] new TBrowser() • Data inside the event are called “Product” • moduleLabel : productInstanceLabel : processName • Example: recoTracks_generalTracks_RECO malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 27 The TTree Objects • Connected via 'SmartPointer' relations • edm::Ref, edm::Ptr, … • Strong hierarchy • There is no direct connection between basic objects and super structures • Composed object with minimal space overhead • Central structure: • No connection between basic objects themselves • TTrees have provenance • can tell what happened to the file without having access to configuration file • edmProvDump and edmProvDiff can save you days, weeks, months!! malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 28 Events can be visualized In CMS there are visualization tools, one of them is: Fireworks is the light weight event display for analysis. It can be installed on your laptop. You can find it at: https://twiki.cern.ch/twiki/bin/view/CMS/WorkBookFireworks malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 29 The Event Content • Tiered data structure: RECO / AOD • View the event content as a complicated (grown) organic structure • Add/keep/drop trees to adapt content and size • some decisions have already have been taken for you • via the data tier definitions The minimum I want to know AOD Tier The data tier guarantees what information will be available RECO Tier • Different levels of granularity (resemble the order of creation) malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 30 FWLite: A light Version of EDM Sometimes people think the full EDM framework (FWFull) is a bit heavy…Even for those people CMS has something to offer: Typical FWLite config file FWLite: This is bare Root with known data formats (with the same performance!) • Python configuration, edm::Handle, TFileService, data access equivalent to EDM • Also PAT is fully compliant with (and even especially supports) FWLite. • NO writing to the event content! • Full framework ↔ FWLite: This is NOT an exclusive OR! • Have a look at: WorkBookFWLite • AT provides BasicAnalyzer class and a wrapper Classes to transform this class into an EDAnalyzer or a FWLiteAnalyzer executable ( see twiki WorkBookFWLiteExamples) malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 31 Typical CMS Analysis Workflow • Prompt reconstruction at Tier-0. • Central skims at Tier-1's. • Users run cmsRun at Tier-2's: • Perform high level analysis steps. • Preselect events. • Write their own user defined • EventContent to private T2/T3 space • The latter step might be iterated. • Copy reduced datasets to your favorite machine. • Run your final analysis/produce plots malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 32 AT Recommendation • Use cmsRun with crab • Use cmsRun on batch systems or FWLite • Use cmsRun to write persistent datasets to your disc space and for more complex analysis tasks • Write PAT Tuples • Use FWLite for testing/interactive analysis on a complexity level comparable with complex analysis tasks • Write flat n-Tuples if you really think that you need them. Use EDM-ntuples instead of bare root TTrees! • Rather have at look at: SWGuideEDMNtuples PAT helps you create a user-defined EventContent malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 33 Difficulties with the EDM The EDM bares two main problems from the analysist's point of view: • Calculation/retrieval of high level analysis information (complicated pointer arithmetic! what do I need? Where do I find it?) • Reduction only to the relevant high level analysis information (where is the dropped data used throughout the event?) malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 34 Analyst's Issues • I want use muons only. I dropped everything else from the event. Now I cannot access the 2/ndof? • I found out that the GsfElectron collection consists of many objects but only 10% can be real. There must be some electronID, how can I find and apply it? • I found out that CaloJets are uncorrected. How can I apply the JetMET calibration for the absolute JES to them? Do I need DB access for that? • I want to match reco objects to generated particles. I want to write my own matching. Is there an official recommendation for algorithms and parameters? • I want to match trigger muons to offline reconstructed muons. How do I do this? • How to reduce the event content? • Skim events (keep dijects with pt > 50?), dropping objects (no muons,, keep jets), reducing object size (keep 4-vector, drop the rest) PAT comes in here and help you malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 35 What is the Physics Analysis Toolkit PAT is a toolkit that is an integral part of the CMSSW framework • It is an interface between the sometimes complicated EDM and the simple mind of the common user. • It serves as well tested and supported common ground for group and user analyses • It facilitates reproducibility and comprehensibility of analyses • You can view it as a common language between CMS analysts • If another CMS analyst describes you a PAT analysis you can easily know what he/she is talking about malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 36 Three aspects of PAT Interface Common Tool • b/w RECO expertise & Analysis Level contacts • simplifies access via DataFormats • canalizes expertise (via POG & PAG • crossing point between POGs & PAGs ('vertical integration') • approved algorithms & sensible defaults • synergy (everybody can profit from recent developments) • quick start into analysis for the beginners Common Format • facilitates transfer & comparisons • PAG common configurations • sustained provenance malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 37 The PAT Data Formats • All pat::Objects inherit from their corresponding reco::RecoCandidates • A PAT Candidate is a reco::RecoCandidate + more malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 38 Combine Flexibility and User Friendliness • You can choose yourself whether you really need all the extra information that the PAT Candidates provide • Still you don't need to know, how EDM/PAT manages this access for you under the hood Flexibility User Friendliness Maximal Configurability • The key is: configuration of DataFormats by cfi file! (e.g. for pat::Jets) malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 39 Configuration of PAT DataFormats You can configure the content of the DataFormats yourself (example: pat::Jet)! Size: 14 kb/event ( for ttbar) • This can be slimmed even further malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 40 The concept of Maximal Configuration DataFormats EventContent object embedding pat event content • Configure your own DataFormats via embedding • Add any extra info you need to the EventContent Maximal Configurability WorkFlows Selections workflow tools pat selectors • Configure your workflow via tools that PAT provides malik@fnal.gov • Apply selections via the StringCutParser CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 41 Particle Flow • Particle flow (PF) provides a full unambiguous reconstructed particle based event interpretation, more sophisticated ways to do objection disambiguation • PF is plugged in PAT as follows and is called PF2PAT • SWGuidePF2PAT/WorkBookPF2PAT malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 42 PF2PAT • Can run PF2PAT and standard PAT in parallel • Example, standard PAT muons will still be stored in patMuons and the muons found by PF2PAT will be stored in patMuonsPOSTFIX malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 43 The Code Location • DataFormats/PatCandidates • Definition of all PAT Candidates. • pat::Photon, pat::Electron, pat::Muon, pat::Tau, pat::Jet, pat::MET, … • PhysicsTools/PatAlgos • Implementation and filling of all data formats. • Definition of common workflow and PAT tools • PhysicsTools/PatUtils • Definition of common tools and helper functions used in PatAlgos • PhysicsTools/PatExamples • Location of many examples e.g. all non-trivial examples used during this Tutorial malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 44 Development Have a look at: SWGuidePATRecipes malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 45 Support for AT Check the the main entry page of PAT in the software guide: SWGuidePA malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 46 PAT Documentation • SWGuidePAT/WorkBookPAT Main documentation pages • WorkBookPATDataFormats Description of all PAT Candidate • WorkBookPATWorkflow Description of the PAT workflow • WorkBookPATConfiguration Description of the configuration of PAT • SWGuidePATTools Description of all PAT tools • WorkBookPATTutorial Tutorials and examples to get started • SWGuidePATRecipes Installation recipes • SWGuidePATEventSize Tools for event size estimate • SWGuidePF2PAT/WorkBookPF2PAT Main documentation PF2PAT • WorkBookPATTutorial malik@fnal.gov PAT Tutorial CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 47 Concluding remarks • You have experienced a pre-view of these tools, concepts doing CMSDAS pre-exercises • Use the above to understand the physics objects and do physics analysis during this school • SHORT and LONG exercises • Participate in: • PAT tutorials to know about PAT and PF2PAT • CRAB, Fireworks, DAS tutorials etc. • CMS Data Analysis School(s) • Contribute to physics tools and tutorials – service credit given • https://twiki.cern.ch/twiki/bin/view/CMS/Tutorials • hn-cms-physTools@cern.ch (analysis tools releated – PAT, PF2PAT,tag and probe etc.) • hn-cms-crabFeedback@cern.ch (crab related) • Analysis tools meetings (announced in the hn above, link to all, below) malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 • https://twiki.cern.ch/twiki/bin/view/CMS/AnalysisToolsMeetings 48 BACKUP malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 49 CMS Event Data Model (EDM) II • The CMS software CMSSW is based on ROOT • All CMS EDM output files, from RAW data files just from the HLT to analysis PATtuples are proper ROOT files • System is designed so that the same root file can be used in three contexts: • cmsRun - input for a full framework application, could be used for a reprocessing pass or creation of more refined analysis objects, like PAT objects. • FWLite – a small set of CMS defined loadable libraries to allow more sophisticated use with root, for the EDM this is a read-only application. • Bare root.exe –allows browsing of simple objects (floats, ints, and composites of them) from the root TBrowser GUI is the ONLY recommended use case • Pros • Single File Structure, Reconstruction of detector object, physics objects (POGs) • Physics Analysis ( PAGs) • Track event from initial recording to “final” paper - Provenance • Challenge • strong software requirements, flexible, easy to extend, not be space expensive • More info can be found at WorkBookCMSSWFramework WorkBookAnalysisOverviewIntroduction malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 50 Concept of Framework (FW) • Applications in CMSSW are built from special shared object libraries called plugins. In practice this means: • cmsRun <some-configuration-file> • Configurations are written in the Python language. • There are two types of plugins: • Module Plugins • EDAnalyzers: e.g., making histograms • EDProducers: e.g., making tracks from hits and saving tracks in event • EDFilters: e.g., Skimming - Deciding or not to keep a given event • Others: ESSources, ESProducers. • ED* process event data, ES* process event setup data • Data Object Plugins – also known as “root dictionaries” as they can also be loaded directly into the root application. These are most of the products of the above work, and form the elements of the EDM • cmsRun workflows can mix standard plugins with user defined ones malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 51 Workflow MC Workflow Scientific analysis, simulation requires the processing and generation of millions events, calibration, skims, express analyses Collision data The set of processes that both MC and collision data flow through malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 52 Integration Build (IB) IB - Tests that CMSSW code compiles, standard jobs run without errors - do some code quality and run quality tests cmsDriver.py is a utility used to generate a full configuration file, which can be run with cmsRun for the above tests and production quality configuration files. If you generate MC events you will come across it too (WorkBookGeneration) http://cmssdt.cern.ch/SDT/html/showIB.html malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 53 Framework Subtleties: Example • I want to use muons only. I dropped everything else from the event. Now I cannot access the 2/ndof? • You dropped the generalTracks collection from the event content, did not consider that reco::Muon internally points to the inner track points in generalTracks collection it was created from malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 54 Facilitated Access to Event Information • Do you know how to access this event information within the EDM? • With PAT Candidates you get this just by calling member functions! • Note: Each PAT Candidate IS a corresponding reco::RecoCandidate (+ more) malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 55 EventContent of the default PAT Tuple Size: 14kb/event (for ttbar) • A default patTemplate_cfg.py: • But decide yourself how your PAT Tuple should look like (add reco::Tracks or reco::GenParticles to the Event Content or BTag information to the jets, etc ... ) malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 56 PAT Candidate Member Functions Check the Documentation: WorkBookPATDataFormats malik@fnal.gov CMS Data Analysis School@NTU, Taiwan 11-15 Sep 2012 57