SERPent - Automated RFI software for e-MERLIN Luke W. Peck∗ & Danielle M. Fenech† University College London August 31, 2012 Abstract This memo summarizes the SERPent pypeline used to address the varying RFI environment present across the e-MERLIN array. SERPent is an automated RFI mitigation procedure ultilizing the Sum Threshold methodology (LOFAR pipeline) and is written in the Parseltongue language to interact with the AIPS program. In addition to the flagging of RFI affected visibilities, the script also flags the Lovell stationary scans inherent to e-MERLIN system. Both flagging and computational performances of SERPent are presented here with eMERLIN commissioning datasets for both L and C band observations. The refining of automated reduction and calibration procedures is essential for the e-MERLIN Legacy projects, where the vast data sizes (> TB) mean that the traditional astronomer interactions with the data are unfeasible. ∗ † email: lwp@star.ucl.ac.uk email: dmf@star.ucl.ac.uk 1 1 Introduction With the advent of new receivers, electronics, correlators and optical fibre networks; modern interferometers such as e-MERLIN, EVLA and ALMA are becoming an ever more sensitive window to the radio universe. The wide receiver bandwidths are particularly prevalent in increasing the sensitivity of the interferometer by increasing the spectral range of radio frequencies observed and thus increasing the uv coverage. However, this increased bandwidth now incoorporates more radio frequencies reserved for more commercial purposes such as mobile phones, satellites, radio stations to name a few sources. Such Radio Frequency Interference (RFI) is an old nemesis of radio astronomers who traditional removed or ‘flagged’ RFI manually, but with the increase in data sizes from the order of Megabytes to Terabytes due to recent improvements to arrays, this becomes an unreasonable task. Thus highlighting the importance of automated procedures, particularly for RFI mitigation. The Scripted E-merlin Rfi-mitigation PypelinE for iNTerferometry (SERPent) was created to tackle this problem for the RFI environment affecting e-MERLIN using Parseltongue; a python based language which calls upon AIPS tasks. 2 SERPent Requirements SERPent has been run on a number of systems and seems to be fairly stable. Here is a list of program versions which we are running the code on, and should probably be considered the ‘minimum’ requirements for the code to work. For computational requirements and timings for the script to run on test data, please see Section 6. AIPS release 31DEC11 Python 2.6.5 Parseltongue 2.0 (with Obit 1.1.0) Numpy 1.6.1 3 Lovell Stationary Scans A problem unique to the e-MERLIN array is the Lovell stationary scan. Due to the size of the Lovell telecope and the subsequent slew time, the Lovell telescope only participates in every alternative phase-cal scan, remaining stationary on the target for the other scans. The other antennas in the array are not affected. This results in the visibilities from baselines containing the Lovell telescope to have two different amplitude levels for the phase-cal. In most cases the phase-cal will be brighter than the target, thus when the Lovell is observing the phase-cal the received flux will be greater than when the Lovell does not participate in the phase-cal scan and remains on the target source. This behaviour can be seen using the IBLED task within AIPS on the phase-cal source as figure 1 clearly shows. This figure also displays another problem with early e-MERLIN commissioning data with multiple amplitude levels for scans throughout the observation. This property has been traced to hardware issues within the receivers and new filters appear to have resolved the issue for future observations. However, it was necessary to normalize this problem before flagging this dataset. In the main window each group of points represents one scan, for which there are three distinct amplitude levels. The highest two levels are scans where the Lovell telescope contributes to the 2 Figure 1: AIPS IBLED task window, displaying the phase-cal source: 2007+404, stokes LL (for greater clarification). The top panel shows all scans for the entire observation run, and the main central panel shows a small selection of scans for closer inspection, before running SERPent. observation (including the aforementioned filter issues affecting amplitude levels) and the lowest level scans are where the Lovell does not contribute. Across the entire observation (top panel) the Lovell stationary scans are consistent in magnitude and alternate between every other observation, despite the varying amplitude levels of the Lovell on source scans, indicating that the Lovell dropout scans are indeed the cause of the lowest level scans in figure 1. If the array is e-MERLIN, SERPent will run an extra piece of code, which firstly determines the Lovell baselines. It makes a first run through all the integration times and isolates each scan, and evaluates the magnitude of each scan, the highest and lowest scan statistics and the integration time step. A second run again isolates each individual scan and tests the following condition: if the mean of the scan is between the lowest mean found in the previous run ±σ: then flag the entire scan. The results are written to a text file via the cPickle Python module and are combined with the main SumThreshold flagging results at a later time in the script. Figure 2 shows the IBLED task window on the same phase cal source as in figure 1 after the Lovell stationary scans have been removed by SERPent. 4 RFI Mitigation for e-MERLIN One of the toughest challenges in RFI mitigation is accounting for its variable intensity, morphology and unpredictable nature. There are numerous methods available to astronomers for both pre- and post-correlation, both having advantages and disadvantages. Given the facilities in place at Jodrell Bank, we decided post-correlation techniques would be most prudent. Early commissioning data 3 Figure 2: AIPS IBLED task window, displaying the phase-cal source: 2007+404, stokes LL (for greater clarification). The top panel shows all scans for the entire observation run, and the main central panel shows a small selection of scans for closer inspection after running SERPent. The lowest level scans present in figure 1 have been removed. from e-MERLIN contained RFI varying in both time and frequency, thus necessitating threshold detection methods. We now outline the RFI mitigation techniques deployed by SERPent. 4.1 SumThreshold Method The most effective thresholding method was demonstrated by Offringa et al. 2010b [3] to be the SumThreshold and this is the adopted RFI detection method for SERPent. An overview of the method will be given here, for a more in depth analysis of the method please see the afore-mentioned literature. Threshold methods work on the basis that RFI increase visibility amplitudes for the times and frequencies they are present. Therefore there will be a considerable difference compared to other RFI-free visibility amplitudes, thus these RFI will be statistical outliers. If these RFI are above a certain threshold condition then they are detected and flagged. The threshold level is dictated by the statistics of the sample population, which can be the entire observation (all time scans, frequency channels, baselines etc) or a smaller portion, for example: separate baselines and IFs. The advantage of separating the visibilities this way increases the computational performance (Python is faster when operating on many smaller chunks of data rather than one big chunk; i.e. Dynamic Programming), and also makes the statistics more reliable as the RFI may be independent of baseline and the distribution between different IFs. This is particularly relevant for L band observations where the RFI is more problematic. The SumThreshold method works on data which is separated by baselines, IFs and stokes and is arranged in a 2D array with the individual time scans and frequency channels comprising the 4 array axes i.e. time-frequency space. The frequency channels were further split by IFs due to the arguments previously stated. The idea is that peak RFI and broadband RFI will be easily detectable when the visibility amplitudes are arranged in time-frequency space. The e-MERLIN correlator outputs three numbers associated with any single visibility: the real part, the complex part and the weight of the visibility. When appending visibilities in the time-frequency space, if the weight is greater than 0.0 i.e. data exists for that time and frequency, then the magnitude of the real and complex part of the visibility is taken to constitute the amplitude. If the weight is 0.0 or less i.e. no data exists for this baseline, time scan etc, then the amplitude is set to 0.0. This visibility will thus have no effect on the statistics or threshold value, but will act as a substitute for that elemental position within the array. The Python module NumPy is employed to create and manipulate the 2D arrays, as the module is written in Fortran (which is intrinsically faster than Python) and has been optimized1 . There are two concepts associated with the SumThreshold method: The threshold and the subset size i.e. a small slice of the total elements (in this case visibitility amplitudes) in a certain direction of the array (time or frequency). The difference between the SumThreshold method (a type of combinatorial thresholding) and normal thresholding is that after each individual element in the array has been tested against the first threshold level χ1 , the values of a group of elements can be averaged and tested against a smaller threshold level χi , where i is the subset number i.e. the number of elements averaged and tested. Empirically a small subset i = [1, 2, 4, 8, 16, 32, 64] works well (Offringa et al. 2010b) [3]. A window of size i cycles through the array in one direction (e.g. time) for every possible permutation for the given array and subset size. After each subset cycle a binary array of identical size records the positions of any elements which are flagged. 0.0 denotes a normal visibility and 1.0 signifies a RFI in the time direction (2.0 for frequency direction and higher values for any subsquent runs of the flagger). At the beginning of the next subset cycle any element within the flag array whose value is greater than 0.0, the corresponding amplitude in the visibility array is reduced to the threshold level χi which progressively gets smaller with increasing subset size. If a group of elements of any subset size i is found to be greater than the threshold level χi , then all elements within that window are flagged. This method is implemented in both array directions (i.e. time and frequency). 4.2 SERPent’s Implementation of the SumThreshold Method In addition to the SumThreshold methodology, certain clauses have been added to prevent the algorithm to overflag the dataset. If any threshold level reaches the mean + variance estimate the flagging run for that direction stops. Before the full implementation of the SumThreshold method is deployed, an initial single subset i = 1 run is done to remove any extremely strong RFI. The amplitudes of any RFI detected are subsequently set to zero and the full flagging run begins. The flagging process can run multiple times at the cost of computational time, and written in the code as default is a second run, if the maximum value within the array is a certain factor of the median and if there are flags from the previous run. On this second run all flagged visibilities from the first run are set to 0.0 in the visibility array so the statistics are not skewed and this run can then search for weaker RFIs which may remain. This may be necessary as some RFI in the early e-MERLIN commissioning data were found to be over 10, 000 times stronger than the astronomical signal and some weaker RFIs were still present. Note that the first run subsets increase in size in binary steps up to 32, and the second run goes deeper to 256. This can easily be manually changed to lower values to save time if there isn’t much RFI in the observations or to greater subset sizes if necessary. 1 It should be noted here that how this module is compiled and called upon on can have a significant effect on performance. 5 The first threshold level can be calculated by a range of methods and statistics. The variance of a sample is an important component for this threshold and various methods are described and tested by Fridman (2008) [1]. The author concluded that Exponential Weighting is the best method from the point of view of LOSS: a measure of the difference in standard deviation of a robustly estimated variance and a simple estimate, in the absence of outliers. But the Median Absolute Deviation (MAD) and Median of Pairwise Averaged Squares are the most effective ways to remove outliers, although they comment that both are not particularly efficient and require more samples to produce the same power, as other methods. Since the sample size in any given observation from e-MERLIN will be of adequate size, this is not such an issue. The breakdown point for MAD is also very high (0.5) i.e. almost half the data may be contaminated by outliers (Fridman 2008) [1]. MAD is adopted by this algorithm due to these robust properties. Again the author stresses that the type and intensity of RFI, type of observation and the method of implementation are important factors when deciding what estimate to use for any given interferometer. The variance MAD used in the SERPent algorithm is defined by equation 1, where mediani (xi ) is the median of the original population. Each sample of the population is then modified by the absolute value of the median subtracted from each sample. The median of this new absolute median subtracted population is taken and multipled by a constant 1.4286 to make this estimation consistent with that of an expected Guassian distribution. M AD = 1.4826 medianj {|xj − mediani (xi ) |} (1) The first threshold level χ1 is thus determined by an estimate of the mean x̄, the variance σ and an aggressiveness parameter β (equation 2) (Niamsuwan, Johnson & Ellingson 2005) [2]. Since the median is less sensitive to outliers, it is preferred to the traditional mean in this equation (thus x̄ = median) and the MAD to the traditional standard deviation for the variance for similar reasons (σ = M AD). If the data is Guassian in nature then the MAD value will be similar to the standard deviation and the median to the mean. A range of values for β were tested until a stable value were found for multiple observations and frequencies of around β = 25. Increasing the value of β reduces the aggressiveness of the threshold and decreasing the value increases the aggressiveness. χ1 = x̄ + βσ (2) The subsequent threshold levels are determined by equation 3 where ρ = 1.5, this empirically works well for the SumThreshold method (Offringa et al. 2010b) [3] and defines how ‘coarse’ the difference in threshold levels, and i is the subset value. χi = χ1 log ρ 2i (3) In summary, SERPent firstly calculates the median and MAD for each IF, baseline and stokes. An initial run cycles through the visibilities to remove any individual amplitudes which are over the first threshold, in case there are extremely strong RFI present and then sets them to zero for the subsequent full flagging runs. Then the script starts the ‘first’ full run of the SumThreshold method in both time and frequency directions. After this is completed it again sets any flagged visibility amplitude’s to zero and recalculates the statistics. Then the second SumThreshold run is performed to try and remove some weak RFI. All the parameters described here can be manually changed by the user via the SERPent input file. 6 5 SERPent Outputs Whilst SERPent is running it will continuously output text files containing information on the flagging to a designated folder set by the user. The cPickle module in Python is used to store the NumPy arrays from the flagging and are all read back in once the flagging has finished. SERPent operates in this fashion due to the way it is parallelized for performance, meaning each CPU does not need to retain any Python variables or information and is free to flag multiple runs. Each of these files will be automatically deleted by SERPent at the end of the script. SERPent will combine all the flag arrays and format them into the appropriate FG extension table required by AIPS. To maximize the FG table efficiency, the SERPent FG output is fed through the REFLG task in AIPS to condense the number of FG rows. This is important due to the limit imposed by certain calibration tasks in AIPS in the number of FG entries which can be applied. Multiple FG files will be created by SERPent from the flagging, Lovell dropouts, a combination of both and after the REFLG task has been run (if the AIPS version is recent enough). Whilst the REFLG FG table is automatically attached to the input file, these files remain for user manipulation. 6 SERPent Performance Here we document the performance of the early test runs of SERPent on old MERLIN data, early e-MERLIN commissioning data and RFI test data supplied by Rob Beswick (Jodrell Bank). Table 1 shows details on the datasets tested here. All tests have used SERPent version 31/07/12. Table 1: SERPent Performance Test Datasets Telescope MERLIN2 e-MERLIN e-MERLIN e-MERLIN 6.1 Dataset Name M82V RFI Test Data: 1436+6336 COBRaS W1 2011: 0555+398 COBRaS W1 2011: All Sources Size 212 MB 1.63 GB Band L L Visibilities 82692 5812 Sources 6 1 Baselines 21 10 IFs 1 12 Channels 31 512 Stokes 2 4 2.33 GB C 99149 1 10 4 128 4 25.3 GB C 1079150 6 10 4 128 4 Flagging Performance SERPent has been tested on both L and C band observations and has been found to flag all C band RFI and the majority of L band RFI. The remaining L band is usually weak broadband RFI or very weak RFI close to the median value of the sample. Firstly we present some results from L band data. Figure 3 shows some RFI test data of 1436+6336 (data courtesy of Rob Beswick) with one baseline displayed via AIPS task SPFLG in time-frequency space. The first IF is completely wiped out with noisy data, and some weak broadband RFI remains in the central IFs. Almost everything else has been flagged, including some very intricate RFI which can not be done as accurately with more simplistic RFI flagging routines. The L band results have shown that SERPent can flag complicated RFI in time-frequency space, and figure 4 shows this also applies to the C band with the infamous ‘wiggly’ RFI found in commissioning data. Note that this was very poor quality data and SERPent even started to flag some of the noise. However this is a good example of the thresholding method in action. 7 Figure 3: AIPS SPFLG image of 1436+6336, L band, baseline 7 − 8, stokes: RR, IF: 1 − 12 after SERPent flagging. The AIPS task REFLG was also deployed in this image. The vertical axis is time and horizontal axis is frequency. 6.2 Computational Performance We have conducted multiple runs on a range of datasets and computers to access the flagging and computational performance of SERPent. The manner in which SERPent is currently parallelized, the script distributes the data via baselines to different CPUs and thus running multiple flagging runs in parallel. Thus the speed performance will increase with more CPUs, until the number of CPUs exceeds number of baselines, for e-MERLIN this is 21 baselines for the full array. So the performance increases when the number of CPUs is a factor of the number of baselines e.g. 10 baselines distributed over 4 CPUs means 2 CPUs will run 2 baselines and the other 2 CPUs will run 3 baselines. Alternatively; running 12 baselines on 4 CPUs has the same speed as running 9 baselines over 4 CPUs as the limiting factor is the CPU running the extra baseline, hence this factor performance increase. It would also be correct to state that running 9 baselines on 3 CPUs will roughly be the same speed as running 9 baselines on 4 CPUs. There are views to increase the parallelization, by further splitting the jobs in IFs as well as baselines. This will spread the workload more evenly across different CPUs whereas before some CPUs would be idle. Computers with a large number of CPUs (> 20 CPUs) will also benefit from this type of separation. We are discussing also splitting the data in time, as currently all the timescans for a baseline and IF are passed through the flagger. This will have the performance increase due to the nature of Python performing faster on many but smaller chunks of data rather than a few big chunks. We have so far analysed two datasets; M82V and COBRaS W1 2011, for computational performance on two different computer systems. Table 2 gives details on the computer systems we have tested SERPent performance on. Table 2: Computer Systems Computer Name Leviathan Desktop Memory (GB) 100 4 NCPUs 16 4 Firstly we compare the time taken to flag the MERLIN M82V and e-MERLIN COBRaS W1 8 Figure 4: AIPS SPFLG image of 0555+398, C band, baseline 5 − 7, stokes: RR, IF: 2 before (left) and after SERPent flagging (right). The AIPS task REFLG was also deployed in this image. The vertical axis is time and horizontal axis is frequency. 2011 dataset with both the Desktop and Leviathan using the entire range of CPUs available for each system. Figure 5 demonstrates the average time taken over three separate runs for each number of CPU for each computer. It is clear that by increasing the number of CPUs and thus splitting the workload, results in an increase in performance which levels off as the number of CPUs reaches the number of baselines in the dataset. To improve the performance, further parallelizations will need to be done by splitting tasks via i.e. IFs in addition to baselines to individual CPUs. Increasing the amount of memory also increased the computational performance, albeit by a smaller amount than the parallelization. Leviathan has 25x more Memory than the standard Desktop computer in our tests and is faster by a factor of 1.7, consistently when comparing between multiple number of CPUs for both computers and datasets. This shows that the limiting factor of running SERPent on interferometric datasets is the shear volume of data that needs processing and not a RAM issue. We displayed both datasets next to each other to highlight the increase in data size ∼ 11x between the datasets and the seemingly linear time increase needed to process them i.e. SERPent takes ∼ 11x longer to run using the same setup between datasets of 212MB and 2.3GB sizes. Initial tests on the full 25.3GB COBRaS W1 2011 dataset reveals that Leviathan needs ∼ 6 hours with 10 CPUs to process the data. Comparing this to the time needed for Leviathan and 10 CPUs to process 2.3GB of data (a ∼ 11x increase in size) gives the ratio of over 9. Thus for a simplistic estimation, it is reasonable to assume a linear progression for computational time to run SERPent and dataset size. Lastly we present the performance of the Desktop and Leviathan on multiple CPUs as a ratio to a single CPU in figure 6. At a low number of CPUs a linear relation exists in performance increase which plateaus off at two distinct levels. This is due to how SERPent distributes tasks to different 9 2500 MERLIN M82V - 212MB Time Taken (secs) 2000 25000e-MERLIN COBRaS W1 2011 - 2.33GB Leviathan 100GB Desktop 4GB 20000 1500 15000 1000 10000 500 5000 00 2 4 6 8 10 12 14 16 18 00 NCPUs 2 4 6 8 10 12 Figure 5: showing the time taken to flag the MERLIN M82V (left) and the e-MERLIN COBRaS W1 2011 (right) datasets using a common Desktop computer and Leviathan over a range of CPUs. Each point is an average of 3 runs using the same number of CPUs. Note that the COBRaS W1 2011 dataset only had 10 baselines and due to the current parallelization would only benefit from using 10 CPUs with Leviathan. CPUs as explained above. 7 Conclusion and Discussion We have presented a simple script to flag RFI from radio interformetric data, ultilizing common software packages and programs. The readily available scripts provide a simple and easy way for the astronomer to remove RFI from their data. SERPent also addresses the Lovell stationary scan problem which would otherwise affect the automated flagging methods within SERPent and perhaps other scripts. We have discussed the RFI mitigation techniques involved and demonstrated the flagging and computational performance of SERPent on a range of machines and setups. We now discuss some areas of RFI mitigation which may be of interest. To achieve complete automation of any procedure is a challenging task, particularly when many variables and unexpected problems arise within arrays and in real datasets. The amplitude splitting problem from receiver filter issues discussed in Section 3, provides a unique example to the unexpected problems that may occur in any individual dataset. These issues have to be resolved if successful calibration and analysis of the data is to happen, but they can not always be predicted. To produce a complete pipeline (including reduction, flagging and calibration) which is to run blind on an observation is perhaps something for the future when the interferometer’s systems are stable and their behaviour is well established. This point carries into another discussion on the size of datasets from modern interferometers. The reason we strive to achieve complete pipelines is due to the data volume from modern interferometers. The computational performance demonstrated here shows that with old MERLIN data (212MB, 6 sources) the entire dataset can be flagged in around 11 minutes on a modest desktop 10 MERLIN M82V NCPU Performance 7 Speed Ratio Relative to 1 CPU 6 5 4 3 2 1 00 Leviathan 100GB Desktop 4GB 2 4 6 8 10 NCPUs 12 14 16 18 Figure 6: showing the speed relations of running SERPent on multiple CPUs on the Desktop and Leviathan relative to a single CPU on the same systems. The ‘plateau’ fluctuations will be due to minor differences in running conditions during tests. (4GB memory and 4 CPUs), but a modest dataset from early commissioning e-MERLIN observations (25GB total, 6 sources with only half the full number of baselines), needs around 6 hours running on a more powerful computer (100GB and 10 CPUs). Full e-MERLIN Legacy project datasets are expected to be even larger ranging from 100’s GB up to a TB in size. This clearly shows the computational challenge required to reduce these datasets, and also the necessity of automating procedures as manually handling this amount of data becomes unfeasible. In the future as interferometers get ever more powerful with arrays like the SKA, these issues will only magnify. The scripts and procedures written for the current crop of interferometers such as e-MERLIN provide a valuable insight and act as a stepping stone to the next generation of radio interferometers. References [1] P. A. Fridman. Statistically Stable Estimates of Variance in Radio-Astronomy Observations as Tools for Radio-Frequency Interference Mitigation. ApJ, 135:1810–1824, May 2008. [2] N. Niamsuwan, J. T. Johnson, and S. W. Ellingson. Examination of a simple pulse-blanking technique for radio frequency interference mitigation. Radio Science, 40:5, June 2005. [3] A. R. Offringa, A. G. de Bruyn, M. Biehl, S. Zaroubi, G. Bernardi, and V. N. Pandey. Postcorrelation radio frequency interference classification methods. MNRAS, 405:155–167, June 2010. 11