THE STANDARD ENVIRONMENT FOR THE ANALYSIS OF LAT DATA Seth Digel (Stanford University/HEPL) David Band (GLAST SSC—GSFC/UMBC) for the SSC-LAT Software Working Group and the Science Tools Working Group September 25, 2002 LAT IOC GLAST Science Support Center LAT Science Tools Review OUTLINE • • • • • Introduction--designing the standard analysis environment Design Considerations Definition Process Components of the analysis environment Development concept September 25, 2002 LAT IOC GLAST Science Support Center LAT Science Tools Review—2 INTRODUCTION (§1.1-1.2, 1.4) • The standard analysis environment consists of the tools and databases needed for routine analysis of LAT data. • This environment will be used by both the LAT team and the general scientific community. • This environment was defined jointly by the LAT team and the SSC, but will be developed under the LAT team’s management with SSC participation. • The analysis environment does not support all possible analyses of LAT data. For example: – The tools the LAT team will develop for the Point Source Catalog will not be included – No tools for multi-gamma events or cosmic rays are defined either September 25, 2002 LAT IOC GLAST Science Support Center LAT Science Tools Review—3 DESIGN CONSIDERATIONS (§1.3) • The analysis of LAT data will be complex because of: – The LAT’s large FOV – Scanning will be the standard observing mode (photons from a source will be observed with different angles to the detector) – A large PSF at low energy – A large data space populated by a small number of photons • The environment will be compatible with HEASARC standards – Files will be in FITS format with HEASARC keywords and GLAST extensions – Data are extracted by utilities that isolate the analysis tools from databases whose format or architecture may change – Data and tools will eventually be transferred to the HEASARC • Accommodating the user community--the software must be usable by remote investigators with limited institutional knowledge September 25, 2002 LAT IOC GLAST Science Support Center LAT Science Tools Review—4 DEFINITION PROCESS (§1.5) • The process formally began in January, 2000, with a meeting at SLAC (before the SSC was constituted) • The data products the analysis environment will use were defined by the Data Products Working Group (mid-2001 to early 2002). Its report will be the basis of ICDs. • SSC-LAT Software Working Group (2002) established to define the analysis environment. 3 LAT and 3 SSC representatives, co-chaired by Seth Digel and David Band • June, 2002--Science Tools workshop at SLAC (~30 attendees LAT+SSC) to review the proposed analysis environment. • Definition of analysis environment guided by use cases, anticipation of science possible with LAT data, and expertise analyzing high energy astrophysics data. September 25, 2002 LAT IOC GLAST Science Support Center LAT Science Tools Review—5 USER ENVIRONMENT (§3.1) • Users will be able to run the tools from a command line or through an over-arching GUI system. Both interfaces common in high energy astrophysics. • Regarding the command line interface, there will be two tool categories: – Minimal interaction--these tools will be similar to FTOOLS – Interactive--tools such as the likelihood tool will require a great deal of interaction • The command line interface lends itself to scripting by the user to automate certain analyses. Scripting will be possible within and among tools. • The GUI system will bind together the tools. Users will interact with the tools through GUIs. • The GUIs will have a common look and feel: common terminology, standard placement of buttons, etc. September 25, 2002 LAT IOC GLAST Science Support Center LAT Science Tools Review—6 USER ENVIRONMENT—DATA DELIVERY(§3.2) • Level 1 data (D1, post-event reconstruction & classification) and corresponding Pointing/livetime history (D2) will be delivered by the LAT IOC to the SSC with defined (short) latency after Phase 1 • IRFs (D3) will be monitored by LAT team, and updated as required (infrequently, it is hoped) • Pulsar timing database (D4) will be maintained by SSC with ephemerides from Pulsar Working Group of S. Thorsett (IDS) • LAT point source catalog (D5) and updates will be released periodically (schedule TBD) • The SSC will supply data to a mirror site in Italy, and the LAT team to its foreign collaborators, to provide scientists in different countries with easier (faster) access to the data September 25, 2002 LAT IOC GLAST Science Support Center LAT Science Tools Review—7 USER ENV.—SOFTWARE DELIVERY AND SUPORT (§3.2-3.3) • Data extraction tool U1 for Level 1 database will be run on central servers, but others principally will run on clients • Source code and binaries will be delivered in zip, tar, and RPM formats • Documentation will be provided for all tools describing their use, their applicability and their algorithms • Also to be available online: – Analysis threads (like those described in §2) – Help desk (jointly supported by SSC and LAT) and FAQs – Contributed (and vetted but not supported) software September 25, 2002 LAT IOC GLAST Science Support Center LAT Science Tools Review—8 COMPONENTS OF THE ENVIRONMENT (§2) Pulsar ephem. (D4) Level 0.5 Level 1 (D1) Pointing/livetime history (D2) Alternative source for testing highlevel analysis Event display (UI1) Pulsar period search (A4) Ephemeris extract (U11) Arrival time correction (U10) Pulsar phase assign (U12) Data extract (U1) Pt.ing/livetime extractor (U3) Alternative for making additional cuts on alreadyretrieved event data Observation simulator (O2) Data subselection (U2) Pt.ing/livetime simulator (O1) Pt.ing/livetime extractor (U3) Exposure calc. (U4) Catalog Access (U9) Likelihood (A1) Astron. catalogs (D6) Map gen (U6) IRFs (D3) IRF visualization (U8) Src. ID (A2) GRB event binning (A5) GRB unbinned spectral analysis (A9) User Interface aspects of the standard analysis environment, such as Image/plot display (UI2), Command line interface & scripting (UI4), and GUI & Web access (UI5) are not shown explicitly. GRB spectral-temporal modeling (A10) GRB spectral analysis (A8)2 1 September 25, 2002 Source model def. tool (U7) Interstellar em. model (U5) GRB LAT DRM gen. (U14) This tool also performs periodicity tests and the results can be used to refine ephemerides 2 These tools can also take as input binned data from other instruments, e.g., GBM; the corresponding DRMs must also be available. LAT Point source catalog (D5) Pulsar profiles (A3)1 GRB rebinning (A6)2 LAT IOC GLAST Science Support Center GRB visualization (U13) GRB temporal analysis (A7)2 LAT Science Tools Review—9 DATABASES ID Name D1 Event Summary (level 1) D2 D3 D4 D5 D6 September 25, 2002 Description Summary information (i.e., everything needed for post-Level 1 analyses) for photons and cosmic rays, possibly segregated by event classification Pointing, livetime, Timeline of location, orientation, mode, and mode history and accumulated livetime Instrument response PSF, effective area, and energy functions redistribution functions in CALDB format Pulsar ephemerides Timing information (maintained during the mission) for a set of radio pulsars identified as potential LAT sources LAT Point source Summary information for LAT-detected catalog point sources, from which the LAT point source catalog is extracted Other high-level Astronomical catalogs useful for databases counterpart searches for LAT sources LAT IOC GLAST Science Support Center Size after 1 year 200 Gb (only 10% photons) 70 Mb TBD 1 Mb 10 Mb TBD LAT Science Tools Review—10 UTILITIES ID U1 U2 U3 U4 September 25, 2002 Name Event data extractor User-level data extraction environment Pointing/livetime history extractor Exposure calculator U5 Interstellar emission model U6 Map generator U7 Source model definition Description Basic front end to the event summary database D1 Allows the user to make further cuts on the photons extracted from D1, including energy-dependent cuts on angular distance from a pulsar position Basic front end to the pointing, livetime, and mode history database D2 Uses IRFs (D3) and pointing, livetime and mode history (D2) to calculate the exposure quantities for the likelihood analysis (A1) Not software per se, although the model itself will likely have parameters (or more complicated adjustments) that need to be optimized once and for all, in concert with an all sky analysis for point sources. Binning of photons, regridding of exposure, coordinate reprojection, calculation of intensity Interactive construction of a model for the region of the sky by specifying components such as the interstellar emission, point sources, and extended sources, along with their parameter values; the resulting models can be used in the likelihood analysis (A1) and the observation simulator (O2) LAT IOC GLAST Science Support Center LAT Science Tools Review—11 UTILITIES, CONT. U8 IRF visualization U9 Catalog access U10 Gamma-ray arrival time barycenter correction U11 Pulsar ephemeris extractor U12 Pulsar phase assignment U13 GRB visualization U14 LAT GRB DRM generator September 25, 2002 Extracting PSF, effective areas, and energy distributions from the CALDB (D3) for visualization Extracting sources from catalogs, LAT and otherwise (D5 & D6) For pulsar analysis Interface to the pulsar ephemeris database (D4) Using ephemeris information from U11 and barycenter time corrected gamma rays from U10, assigns phases to gamma rays Displays various types of GRB lightcurves and spectra Generates the detector response matrix appropriate for fitting binned LAT GRB spectral data. LAT IOC GLAST Science Support Center LAT Science Tools Review—12 ANALYSIS TOOLS ID A1 Name Likelihood analysis A2 Source identification Pulsar profiles A3 A4 Pulsar period search A5 GRB event binning A6 GRB rebinning A7 GRB temporal analysis A8 GRB binned spectral analysis A9 GRB unbinned spectral analysis A10 GRB spectraltemporal physical modelling September 25, 2002 Description Point source detection, characterization (position, flux, spectrum), generalized models for multiple/extended sources Quantitative evaluation of the probability of association of LAT sources with counterparts in other catalogs Phase folding of gamma-ray arrival times (after phases are assigned by A3) and periodicity tests Search for periodic emission from a point source Bins events into time and energy bins Rebins binned GRB data into larger time and energy bins Analyzes GRB lightcurves using various techniques such as FFTs, wavelets, etc. Fits binned spectra; may be XSPEC Fits event data with spectral models Fits a GRB physical model to LAT or LAT+GBM data LAT IOC GLAST Science Support Center LAT Science Tools Review—13 OBSERVATION SIMULATORS September 25, 2002 ID O1 Name Livetime/pointing simulator O2 High-level observation simulator Description Generates simulated pointing, livetime, and mode histories (analogue of D2) for a specified time range and observing strategy. This should be available as a tool, but probably most generally useful will be precomputed simulated D2 databases, which can be used subsequently for any study with O2. Generates gamma rays according to the instrument response functions (i.e., after detection, reconstruction, and background rejection), bypassing the instrument simulation step LAT IOC GLAST Science Support Center LAT Science Tools Review—14 USER INTERFACE ID UI1 Name Event display UI2 UI3 Image/Plot display User Support UI4 Command-line interface and scripting GUI interface and Web access UI5 September 25, 2002 Description Displays the tracks of an individual event, requires access to Level 0.5 data General facility for image and plot display Defines the scope of user support at the level of the components of the standard analysis environment Requirements for command line and scripting interfaces to the components of the standard analysis environment GUI interface to the standard analysis environment. For some tools, especially U1 and U4, which have databases stored on a remote server, Web access seems most practical. LAT IOC GLAST Science Support Center LAT Science Tools Review—15 COMPONENTS--MISC. COMMENTS • Common data types that can pass between tools (e.g., FITS files with events) have been defined but were not included in the report • Use cases are not discussed here, but are in the report (§4) • The burst analysis tools are designed to analyze data from the GBM and other missions in addition to the LAT. GBMspecific tools are not included • Multi-mission capabilities can be added to the likelihood tool; we will study whether such analysis is computationally feasible September 25, 2002 LAT IOC GLAST Science Support Center LAT Science Tools Review—16 DEVELOPMENT CONCEPT—MANAGEMENT (§5.1) • Difficulty: software will be developed by scientists and programmers at different institutions with varying proficiency in C++. This army will be difficult to command. • Managers will be established for 7 groups of tools and databases – – – – – – – Databases and related utilities: D1, U1, U2, D2, U3, U6 & U9 Likelihood analysis: D3, U4, U7, U8 & A1 Pulsars: D4, U10, U11, U12, A3 & A4 GRBs: U13, A5, A6, A7, U14, A8, A9 & A10 Catalog analysis: D5, U5, U6, U9 & A2 Observation simulators: O1 & O2 User interface: UI1, UI2, UI3, UI4 & UI5 • Managers will be responsible for monitoring the development, maintaining the schedule, tool-level testing, and interfaces with the other development groups. September 25, 2002 LAT IOC GLAST Science Support Center LAT Science Tools Review—17 MANAGEMENT, CONT. • Managers will be responsible to the LAT-SSC Software Working Group. • Tools will be delivered to the Working Group as they are completed. • We expect the tools to be under configuration control and in a testing regime before being submitted to the Working Group, but formal configuration control and testing will definitely begin at this point. • The Working Group will oversee functional testing, code reviews and documentation reviews before acceptance. September 25, 2002 LAT IOC GLAST Science Support Center LAT Science Tools Review—18 DEVELOPMENT CONCEPT--SOFTWARE DEVELOPMENT (§5.2-5.4) • The tools will be developed primarily with object-oriented programming using C++. • Code in other languages will be accepted if robust compilers exist for the supported operating systems, and the compiled code can be wrapped into C++ classes. • We will support Windows and Linux. • We will use development tools used by the LAT team: – CVS--a code repository with version tracking – CMT--build configuration – doxygen--documentation • Class diagrams will be developed for the tools. In addition to structuring the code, these diagrams will identify common classes that can be collected into a GLAST class library. September 25, 2002 LAT IOC GLAST Science Support Center LAT Science Tools Review—19 DEVELOPMENT CONCEPT—TESTING (§6) • The testing plan is based on that implemented for the development of the LAT instrument simulation • Throughout development, testing of software will be ongoing at several levels – ‘Package’ tests – defined for each package, required to be present, run nightly on then-current development versions – Application tests – i.e., testing at the analysis tool level; the test programs are to verify all requirements, with performance benchmarks also logged – Code reviews – verify compliance with coding and documentation rules • The development schedule includes 2 Mock Data Challenges – Equivalent to end-to-end tests for the high-level analysis – Tests high-level interaction between tools September 25, 2002 LAT IOC GLAST Science Support Center LAT Science Tools Review—20 DEVELOPMENT CONCEPT—SCHEDULE (§7.1-7.2) • Development priority is based on dependencies. For example, simulators are needed to create test data. • The milestones, after the LAT CDR in April, 2003, are the Mock Data Challenges – MDC1: June-October 2004 – MDC2: June-December 2005 – Each MDC includes development of the challenge datasets, analysis, reconciliation of the results with the input, and repair of the analysis tools • FTE estimates are conservative educated guesses which include code development, testing, and documentation • Time for prototyping is scheduled for tools with important open issues, e.g., the likelihood analysis tool (A1) • Completion of the standard analysis environment is scheduled for the end of MDC2, 9 months before launch – This margin will allow for recovery from unanticipated difficulties with development September 25, 2002 LAT IOC GLAST Science Support Center LAT Science Tools Review—21 SCHEDULE, CONT. (§7.3) Development area Databases and related utilities Analysis tools groups Likelihood analysis Pulsar analysis GRBs Catalog analysis Observation simulation User interface Total required Available FTEs1 2002 Q4 6.3 2003 Q1 6.3 Q2 6.3 Q3 3.3 Q4 3.3 2004 Q1 2 Q2 2 Q3 0 Q4 1.2 2005 Q1 1.2 Q2 1.2 Q3 1.2 Q4 1.2 Total 35.5 6.7 0 1.9 0 3 1.7 19.6 11 6.7 0 1.9 0.5 3 3 21.4 14 7.7 0 2.8 0.5 3 3.8 24.1 14 6.7 0 2.8 0.5 3 3.1 19.4 15 5.4 0 1.3 0.5 2 2.5 15 16 0 4 0 0.9 3.3 0 2.5 12.7 17 0 4 0 1.6 3.3 0 2.5 13.4 17 0 3 0.8 3.8 0 1 8.6 17 0 3 0.8 1.8 0 1 7.8 17 0 7 1 1.8 0 1 12 17 0 5 1 0.5 0 1 8.7 17 0 0 0.25 0 0 1 2.45 17 0 0 0.25 0 0 1 2.45 17 39.2 18 17.3 16.5 14 25.1 167.6 206 1 Approximate totals, from survey by institution among participants at the June 2002 Science Tools Workshop at SLAC. Draft 14 Sept 2002 Totals are in person-quarters • • • Note that numbers are person-quarters, divide by 4 to get FTEyears Summary shows adequate labor totals are anticipated, although available FTEs doesn’t appear to match needs early on Shortfall is less than indicated, because development effort for individual tools (assumed to be constant in the above table) can start at a lower-than-average level September 25, 2002 LAT IOC GLAST Science Support Center LAT Science Tools Review—22