The Engineering Database Assumptions and Considerations About the Data The database does not contain ITAR data The content of the database has no proprietary information 1 GByte/day (from FOS) Different parameters have different types (integer, floating point (single or double precision?), string) Different parameters have different cadences Same parameter may have different cadences Required Time stamp precision o For SDP, = seconds (expected to add a small padding in requests) o For Calibration = highest sampling rate (distinguish between 2 consecutive samples (Since highest rate = 64 ms, I assume this is the required precision) Number of different parameters in the database ~ 20,000 Typical time_range = 100 s … 1000 s (JV, PG) similar to the typical exposure time (this range may be longer when combining exposures, or if data is asked per visit, observation) (other possible values up to 900, 7,200, 10,000 s (MS, DS) ) visit time limit = 50 hours = 180,000 s Exception: If thermal history is important, calibration pipelines may need data streaming (Probably no need to worry until it happens) (PG) For a given parameter and range, how many values are expected in the series? Depends on cadence o High-rate = 64ms (16 samples/s, 112,500 samples in 7200 s) (MS, DS) = 4 samples/s (PG) o Low -rate = ?? Out of the 20,000, what is the distribution of High-rate, Low-Rate parameters? o High-rate = applies only to a few Out of the 20,000, what is the distribution by type integer, string, float, double? Out of the 20,000 which parameters will be usually queried? Out of the 20,000 which parameters are usually queried using aggregate functions? Size of parameter list for simultaneous retrieval [p1,p2, ..., pn] where n = 5… 50 on average Out of the 20,000, only a few hundred will be retrieved frequently (mainly by the calibration software). Some parameters will be retrieved a few times over their lifetime Most parameters will NEVER be retrieved It may be necessary to add modeled parameters based on long history to the database to optimize access 1 Compression options may need to be considered Requirements on the Service Interface REST or similar No SOAP Internal (Science Processing & Calibration Pipeline) and External (External Calibration, Archive User Interface) access Provides Authentication/Restriction to avoid abusive behavior that produces “denial of service” Optimized for Calibration pipelines The 20 Queries 1. What are the engineering parameters, type, domain, and units? GetParameters(*, type=1, domain=1, units=1) 2. What are the engineering parameters associated with a given instrument, type, domain, and units? GetParameters(instrument = ‘instrument’, type=1, domain=1, units = 1) 3. What are the engineering parameters with name like reg expression = ‘*blah*’, type, domain, and units? GetParameters (regexpression = ‘*blah*’, type=1, domain=1, units=1) 4. Given a parameter, what are its values between time_start and time_end ordered by time? GetValues_Parameter (p, [ts, te]) 5. Given a list of parameters, what are their values between time_start and time_end ordered by time? GetValues_ParameterList ([p1, p2, …pn], [ts, te]) Notes on results format: R = [(p1, (t1, v1), (t2, v2), …(tz, vz)), (p2, (t1, v1), (t2, v2), …(tz, vz)) , …, (pn, (t1, v1), (t2, v2), …(tz, vz))] When parameters share the time stamp t, the result set could be requested to follow a more compact format 2 R = [(t1, ((p1, v1), (p2, v1), …, (pn, v1)), (t2, ((p1, v2), (p2, v2), …, (pn, v2)), ..., (tz, ((p1, vz), (p2, vz), …, (pn, vz))] Another possible compact format is R = [t_start, t_sample, ((p1, (v1, v2, …, vz)), (p2, (v1, v2, …, vz)), …,(pn, (v1, v2, …, vz))] where each parameter has z = (te-ts)/t_sample values. Note: in all cases p1, p2, …, pn could be omitted implying the input list of parameter order 6. Given a parameter, what is the avg, min, max, median, range values between time_start and time_end? GetAvgValue_Parameter (p, [ts, te]) GetMedianValue_Parameter (p, [ts, te]) GetMinValue_Parameter (p, [ts, te]) GetMaxValue_Parameter (p, [ts, te]) GetRangeValue_Parameter (p, [ts, te]) 7. Given a list of parameters, what are the avg, min, max, median, range values between time_start and time_end? GetAvgValue_ParameterList ([p1, p2, …pn], [ts, te]) GetMedianValue_ParameterList ([p1, p2, …pn], [ts, te]) GetMinValue_ParameterList ([p1, p2, …pn], [ts, te]) GetMaxValue_ParameterList ([p1, p2, …pn], [ts, te]) GetRangeValue_ParameterList ([p1, p2, …pn], [ts, te]) Note: Request for aggregates could be also packed as GetStatistics([Avg, Median, Min, Max, Range], [p1, p2, …, pn], [ts, te]) R = [(Avg(p1), Median(p1), Max(p1), Range(p1)), (Avg(p2), Median(p2), Max(p2), Range(p2)), … (Avg(pn), Median(pn), Max(pn), Range(pn))] 3