Introducing Radiation Tolerant Heterogeneous Computers for Small Satellites Fredrik Bruhn Mälardalen University, School of Innovation, Design and Engineering (IDT) P.O Box 883, SE-721 23 Västerås, Sweden +46 707833215 fredrik.bruhn@mdh.se Kjell Brunberg BAP P.O. Box 3000, SE-753 30 Uppsala, Sweden kjell@adv.bruhnspace.com Lars Asplund Mälardalen University (IDT) lars.asplund@mdh.se Magnus Norgren BAP magnus@adv.bruhnspace.com John Hines Independent Consultant 548 Market Street #98125 San Francisco, CA 941045401, USA +1 408 419-9735 johnhines555@gmail.com shelf (COTS) components and hosted on industrial standard form factor. Abstract—This paper presents results and conclusions from design, manufacturing, and benchmarking of a heterogeneous computing low power fault tolerant computer, realized on an industrial Qseven® small form factor (SFF) platform. A heterogeneous computer in this context features multi-core processors (CPU), a graphical processing unit (GPU), and a field programmable gate array (FPGA). The x86 compatible CPU enables the use of vast amounts of commonly available software and operating systems, which can be used for space and harsh environments. A 2012 survey by Ramon Chips of processors for high performance space missions space from 2012 describes the still commonly used BAE RAD750 achieving 300 MIPS, sporting 10.4 million transistors and costing in excess of $200,000 [1]. Another example from the same paper describes the higher end state-of-the-art Proton 200k single board computer using a TI 320C64xx DSP sporting 900 MFLOP. These systems are expensive and of size not compatible with the smallest satellites. The developed heterogeneous computer shares the same core architecture as game consoles such as Microsoft Xbox One and Sony Playstation 4 and has an aggregated computational performance in the TFLOP range. The processing power can be used for on-board intelligent data processing and higher degrees of autonomy in general. The module feature quad core 1.5 GHz 64 bit CPU (24 GFLOPs), 160 GPU shader cores (127 GFLOPs), and a 12 Mgate equivalent FPGA fabric with a safety critical ARM® Cortex-M3 MCU. By contrast, most small space missions using satellites in the range of 1-10 kg today use low performance COTS parts, simple microcontroller devices compared to industrial advanced solutions for rugged computing. Devices such as 8 bit microcontrollers and lower end RISC architectures are frequently used [2]. More advanced FPGAs with built in ARM, PowerPC, Microblaze processors are also commonly used which improves the performance significantly compared to the lower end microcontrollers [3]. Earlier space use applications of x86 processors have not been safety critical and were susceptible to radiation. TABLE OF CONTENTS Mälardalen Aerospace and Robotics Center (MARC) at Mälardalen University have pursued reliable optimized heterogeneous embedded computing architectures for advanced vision systems for many years. Several generations of demonstration hardware has been developed starting with the General Image Multiview Manipulation Engine (GIMME)-1, which featured an Intel Atom processor and Xilinx Spartan FPGA [4]. The second generation GIMME-2 is still under evaluation and features dual core ARM Cortex A8 CPUs in a Xilinx FPGA and the third generation and fully heterogeneous embedded platform, GIMME-3, is presented in this paper. 1. INTRODUCTION .................................................1 2. FORM FACTOR ANALYSIS..................................2 3. ARCHITECTURE ................................................2 4. HARDWARE DESIGN ...........................................3 5. PERFORMANCE TESTING SETUP .......................5 6. RESULTS ............................................................6 7. FUTURE WORK ..................................................7 8. CONCLUSIONS ...................................................8 REFERENCES .........................................................8 We have chosen to partner with AMD and to explore the emerging semiconductor industry new initiative called Heterogeneous Computing under the framework Heterogeneous System Architecture (HSA) driven by the HSA Foundation [5, 6]. 1. INTRODUCTION This work presents the design, analysis, and benchmarking of a new space environment capable heterogeneous computing computer constructed with commercial-of-the978-1-4799-5380-6/15/$31.00 ©2015 IEEE 1 The HSA is industry driven to maximize the computational performance in tightly integrated CPUs, GPUs, DSPs and other programmable accelerators (FPGA) into a single System on Chip (SoC) or Heterogeneous Computing Module (HCM) combining SOC and FPGAs. However, tight integration of these blocks is only one part of unlocking the computational performance of massively parallel systems, the second is to provide the infrastructure with the means of giving the separate functions access to shared memory and data. HSA is a model that presents these features in a manner comprehensible to mainstream software developers, and supported by their development environments. mm x 70 mm. The specified and standardized pinouts are based on the high speed MXM system connector and are vendor independent [9]. The ruggedized MXM connector takes the I/O signals to and from the Qseven® module to the carrier. This MXM connector is a well-known and proven high speed signal interface connector that is commonly used for high speed PCI Express graphics cards in notebooks. COM Express® defines standardized form factors and pinouts for Computer-on-Modules. The standard includes the mini form factor (84 x 55mm), the compact form factor i.e., type 2 and 6 (95 mm x 95 mm) and the basic form factor (125 mm x 95 mm). COM Express® is unique in that it may be used in two ways: One important aspect for all modern processors is memory coherency which has been taken for granted in homogeneous multiprocessor and multi-core systems for decades, but allowing heterogeneous processors with CPU, GPU, DSP, FPGA to maintain coherency in a shared memory environment is a revolutionary concept. HSA unlocks the power of coherency between heterogeneous processors and removes the need of copy operations by queuing up pointers and introducing Cache Coherent Shared Virtual Memory (CC-SVM) and is hence very attractive for a high-performance radiation tolerant processor. 1. As a standalone single board computer; and/or 2. As a processor mezzanine that can be plugged onto a base board, or “carrier” board, that contains the user’s application specific I/O. The Qseven® has the smallest stacking height of the two concepts while the IO is almost identical with only minor differences depending on standard revision. The smaller stacking height and the interpretation by the authors that the small form factor (SFF) market tendency for new designs is toward Qseven® implementation it was selected as the candidate for the heterogeneous radiation tolerant computer. One of the most troublesome problems, apart from the complex software associated with heterogeneous architectures is the communication between the various processing parts. For a system where the partitioning can be done with low data rates between the computational units the delays is not important. The other extreme is when GPUs and FPGAs have to be heavily interleaved with computation on the CPUs. This case requires a fast intercommunication link to hand off data fast. 3. ARCHITECTURE Radiation tolerance is a significant driver for the architectural design, and affects the selection of components, electrical wiring, and de-rating of components. A vital part of radiation tolerance is single-event upset (SEU) mitigation through techniques such as multiple bit error detection and correction (EDAC or ECC). Hence it is important that all, or at least most of the design supports EDAC. A heterogeneous computing computer in this paper is a computer design featuring a combination of multi-core CPU, graphical processing unit (GPU), and field programmable gate array (FPGA). All passive components are de-rated according to the guidelines set forth by the European Cooperation for Space Standardization (ECSS) Space Product Assurance standard for Electrical, Electronic, and Electromechanical (EEE) components [10]. 2. FORM FACTOR ANALYSIS Different industrial standard form factors have been analyzed with respect of modularity, input/output (IO) pinning robustness, expandability, and heat sink applicability. The selected standards to evaluate were the Standardization Group for Embedded Technologies driven Qseven® [7], and the PICMG group COMexpress® type 2 and type 6 [8]. These are all concepts that are industry ready, off-the-shelf, multi vendor, Computer-On-Modules that integrates all the core components of a common PC. Both the Qseven® and COMexpress® modules provide the functional requirements for an embedded application. These functions include, but are not limited to, graphics, sound, mass storage, network and multiple USB ports. The FPGA element of the designed heterogeneous computing module was selected based on known heritage and price performance. An initial starting reference was the Microsemi ProAsic3 FPGA which has been shown to perform well in space environment in industrial, military, and space packaging [11]. However, this circuit has end-oflife (EOL) and was not deemed suitable for a new design. Based on heritage from the ProASIC3 and the assumption that Microsemi would keep the design principles on the new SmartFusion2 FPGA and early radiation reports, the SmartFusion2 was selected as the candidate of choice for the hardware implementation [12]. Also of importance is the fact the SmartFusion2 FPGA contains an ARM Cortex-M3 Qseven® modules are mounted onto an application specific carrier board and have a standardized form factor of 70 mm x 70 mm with an alternate IO expansion board measuring 40 2 microcontroller which can be used as an advanced system watchdog and recovery processor. The redundant data paths are provided by the Low Pin Count bus (LPC) which is a simplified PCI interface for legacy ISA bus devices [17]. LPC bus uses 4 data pins and a 33 MHz clock and can hence transmit approximately 100 Mbps. Selection of a suitable high performance CPU and GPU proved to more challenging. Based on radiation reports from NASA on CPUs from Advanced Micro Devices (AMD) attention was drawn to their offerings. 17 Mrad has been reported for the AMD Liano processor core on a 32 nm silicon-on-insulator (SOI) using hi-Kmetal gates (HKMGs) [13]. A good candidate manufactured on a very similar process was found in the AMD Embedded System-on-Chip (SOC) G-series based on the “Jaguar” architecture (eKabini). The benefit of the AMD G-series is the pinupgradeable eKaveri “Steppe Eagle” revision which has partial HSA support. The G-series chips however are based on 28 nm using the same strained silicon process. The benefit of the AMD SOC is that it include multi-core x86 compatible CPU and GPU in the same physical die as well as a wide range of familiar PC IO such as PCI Express, SATA, USB 2, 3 etc. [14]. In addition the SOC includes new features for enhanced Universal Video Decode (UVD) and Video Encode (VCE) hardware acceleration and enhanced clock gating and C6 ‘deep power down’ capabilities that lower overall power consumption. In terms of raw computational performance the embedded GPU performs up to 256 GFLOPS per clock compute power and supports OpenGL™ 4.2 and OpenCL™ 1.2 full profile. AMD supports Heterogeneous System Architecture (HSA) in the G-series SOC eKaveri versions [15]. This allow amongst many things important features such as unified addressing across all processors, virtual memory coherency, high level language support for GPU compute processors, and HSA intermediate language (HSAIL). These functions help speed up the CPU and GPU interaction and are leveraged in combination with the FPGA in a heterogeneous computer module. Figure 1 illustrates a block diagram of the developed heterogeneous computing module, including the redundant pathways for creating the heterogeneous system by connecting the CPU/GPU and the FPGA with its micro processing unit (ARM Cortex-M3). 4 x CPUs LPC AMD SOC MCU FPGA PCIE GPU 0.5 GB RAM 2 GB RAM Qseven Figure 1. GIMME3 core Architecture 4. HARDWARE DESIGN The Qseven® standard provides 230 pins in the specified MXM connector for standard PC features such as HDMI, Displayport, SATA, PCI Express, CAN, Power etc. [7]. Furthermore, the standard requires a board voltage supply of 5 V which is compatible with many small satellites. However there are no flexible IO pins suitable for FPGA connection in the MXM which is a significant problem since a natural extension of the FPGA usage is for IO translation in addition to acting as a co-processor to the CPU/GPU. In order to harness the full use of the FPGA additional IO capability must be added to the Qseven® or alternatively breaking the standard by changing the IO definition in the MXM connector. The later choice would cause the new heterogeneous module to be non-compliant to the installed systems already on the market and is not preferable. As a consequence, this enhanced Qseven® module was equipped with a 120 pin extension IO capability through a total mated 5 mm board-to-board connector on the bottom side of the Qseven® as seen in Figure 2. An optimized heterogeneous computer module must support a fast link between all integral computing parts. In the selected parts described above, both the AMD SOC and the FPGA supports PCI Express generation 2.0 with 5 Giga Transfers per second (GT/s) per lane. In order to support redundant data pathways at least two different communication channels should be used. The core design uses one lane PCI Express gen 2.0 which provides 5 GT/s or an equivalent of approximately 4 Gbps considering the 8/10 encoding applied on data transfers over PCI Express [16]. 3 Standard Qseven MXM connection Figure 3 shows a photograph of the top side of a commercialized heterogeneous Qseven® module from the Swedish company BAP AB, which is compatible with the architecture described in this paper. On the right hand side the AMD SOC is seen together with DDR3 memory. On the left side is the Microsemi FPGA together with 0.5 GB of DDR3 memory. Both the SOC and the FPGA memory have implemented support of error correction and the total amount is 2 GB for the SOC and 0.5 GB for the FPGA. New 120 pin boardto-board connection Figure 2. Illustration of an enhanced Qseven® module with an extra board-to-board connector mounted on the bottom side, outside the mechanical cooling area. A summary of the enhanced Qseven® features are listed in Table 2. Table 2. Summary of the Enhanced Heterogeneous Qseven® Using the extra 120 IO signals provided by the board-toboard connector, it is possible to carry over multiple FPGA signals to the user carrier board. A summary of the signals is shown in Table 1. Form factor Power input Table 1. Signals in 120-pin board-to-board extension 2 x SERDES 16 x LVDS/GPIO 8 x DDRIO/GPIO 1 x1 PCI Express 2 x I2C / GPIO 2 x SPI / GPIO 2 x UART / GPIO FPGA JTAG 1 x ULPI 1 x Power good 1 x FPGA reset 1 x 12 bit ADC Ground IO connectors CPU/SOC Serializer-Deserializer, 5 Gbps Differential, LVDS signal level Differential, 2.5/3.3 V signal level PCI Express generation 2 I2C bus or GPIO SPI Bus or GPIO Serial communication or GPIO Debugging interface USB v2 FPGA interface FPGA management FPGA management Analog signal input Ground GPU/SOC FPGA DRAM SOC DRAM FPGA Ethernet SOC-FPGA interconnect SOC IO/interfaces With the flexible GPIO it is possible to implement commonly available bus protocols not supported natively by the AMD or the FPGA such as RapidIO, Profibus, Modbus, 1553, SpaceWire etc. FPGA interfaces IO Qseven® enhanced (+) 5 V SOC power net 5 V FPGA power net MXM-230 + 120 extension AMD Embedded G series 415GA Quad Core 1.5 GHz, 2 MB L2 cache AMD Radeon HD 8330E Microsemi SmartFusion2 M2S050-T 2 GB DDR3 with ECC 0.5 GB DDR3 with ECC 1 Gigabit Ethernet PCI Express x1 generation 2.0 LPC bus 2 x1 PCI Express lanes 1 x4 PCI Express lanes 2 x USB 3.0 4 x USB 2.0 2 x SATA version 3.0 LPC bus SM-bus I2C bus SDIO/MMC AMD Debugging 2 x SERDES 24 x differential/single ended IO (16 with LVDS) 2 x I2C / GPIO 2 x SPI / GPIO 2 x UART / GPIO 1 x Controller Area Network (CAN 2.0b) 1 x ULPI JTAG Figure 4 shows a photograph of the bottom side of the module shown in Figure 3 with the additional board-toboard connector clearly visible on the top right side. In total the number of IO from the module is 350. Figure 3. Photograph showing the top side of the developed enhanced heterogeneous Qseven® (70 x 70 mm2 module). 4 mini-ITX carrier supports dual gigabit Ethernet interfaces to the AMD SOC, one gigabit Ethernet interface with Powerover-Ethernet 803.02at support to the FPGA, 2 x USB 3.0, 4 x USB 2.0, 2 x SATA v3.0, SDIO, HDMI, PCI Express x4, COM ports, sound input/output, a FPGA development extension area etc. 5. PERFORMANCE TESTING SETUP Testing of the performance is difficult as there are many different tests and benchmarks, as well as many different methods of defining performance. In this case it was decided to base the benchmark on two default non-tweaked operating systems Ubuntu Linux 14.04.1 LTS 64 bit desktop version and, Microsoft Windows 8.1 OEM 64 bit and image processing algorithms. In the case of Ubuntu the enterprise error correction features of the AMD SOC was enabled. No optimizations have been made for embedded use for either operating system. The enhanced Qseven® device under test (DUT) sample supplied by BAP used the AMD G-series SOC “GX415GA” featuring quad core 1.5 GHz CPUs, Radeon HD 8330E GPU core clocked at 500 MHz, and a SmartFusion 2 M2S050T FPGA. The theoretical floating point performance of the CPU set is 24 GFLOPS and GPU floating point performance is 127 GFLOPS. The AMD memory configuration was set to run at DDR3-1333 which is equivalent to 85.6 Gbps. Figure 4. Photograph showing the bottom side of the developed enhanced heterogeneous Qseven® module. The extra IO expansion capability board-to-board connector can be seen in the upper right. Qseven® modules require a carrier board to be functional. However in this case it is not possible to verify all functionality by using an off-the-shelf Qseven® compatible carrier since it will lack support for the 120 pin extension. In order to verify the design and perform benchmarking a mini-ITX compatible form factor carrier was developed which can exercise all features and provide a simple development environment [18]. The following drivers were used for testing: For Windows 8.1, AMD Catalyst Driver 14.8 For Ubuntu, AMD Catalyst Driver 14.8 A commonly used software suite to stress test CPU/GPUs is the Open Computer Vision Library (OpenCV) [19]. OpenCV is an open-source BSD-licensed library that includes several hundreds of computer vision algorithms. OpenCV version 2.4.8 was used in Windows 8.1 and version 2.4.9 was used in Ubuntu. Both versions were compiled with enabled instruction optimizations for the AMD Jaguar platform. These optimizations include Streaming SIMD extentions (SSE, SSE2, SSE3, SSE4.1, SSE4.2, SSE 4A), advanced vector extensions (AVE), Multimedia extensions (MMX, EMMX), AMD64, and fast float save and restore (FXSR). OpenCL support for GPU acceleration was turned on (which is default in 2.4 OpenCV). OpenCV for Windows was compiled with Visual Studio Express 2013, and for Ubuntu the standard GCC. Benchmarking was done using a subset of OpenCV functions from three different groups; Geometric Image Transformations, Feature Detection, and Motion Analysis and Object Tracking group. Figure 5. Photograph showing the associated mini-ITX compatible development board for the heterogeneous Qseven module. The following features are used for benchmarking: Figure 5 shows a photograph of the associated mini-ITX reference carrier with a mounted enhanced heterogeneous Qseven® covered with a heat sink and a cooling fan. The From Geometric Image Transformations group 5 WarpPerspective, Applies a transformation to an image, [20] WarpAffine, Applies an affine transformation to an image [21]. perspective From Feature Detection group GoodFeaturesToTrack, GoodFeaturesToTrackDetector_OCL, Determines strong corners on an image [22]. From Motion Analysis and Object Tracking group CalcOpticalFlowPyrLK, PyrLKOpticalFlow, Calculates an optical flow for a sparse feature set using the iterative Lucas-Kanade method with pyramids [23]. Figure 7. Screen shot from task manager in Windows 8.1 showing the load at minimal load. A test image in PAL resolution (720 x 576 pixels) was used in the benchmark. Figure 6 shows the test image. Figure 8. Screen shot from system manager in Ubuntu 14.04 showing the load at nominal desktop load. Figure 6. Test image for benchmarking (PAL resolution 720x576 pixels). 6. RESULTS Testing using the GX415GA module was using AMD Cool n Quiet technology. Figure 9 presents a summary of run benchmarks on the enhanced Qseven® DUT. Values are presented in frames per second (FPS) for both Ubuntu 14.04 and Windows® 8.1 running on only CPU and with hardware acceleration using OpenCL (OCL). The first and primary observation is that the platform is stable on load and successfully runs the latest available code successfully. Secondly it is clear that the GPU acceleration makes a very big difference in case of Ubuntu. The reason for this is unclear and must be further investigated. The DUT in this case has Cool n’ Quiet power saving enabled which limits the performance but still shows a maximum 787 frames per second for WarpAffine test and 102 frames per second for the complex Pyramid Optical Flow test. At PAL resolution 787 fps and a black and white test image with 8 bit pixel color depth corresponds to a continuous data flow of approximately 2.5 Gbps which is continuously evaluated and sustained in this test. The Figure 7 and 8 shows screen shots from Windows® 8.1 and Ubuntu 14.04 running on the DUT with minimal or nominal load. Figure 7 reveals that the clock frequency scaling is working with 0.9 GHz average use compared to nominal 1.5 GHz and that 512 MB is devoted to GPU memory leaving 1.5 GB RAM for CPU. 6 approximate power consumption during testing was 6 W. It is clear that even with only CPU/GPU power new capabilities in terms of advanced on-board data processing can be achieved. calculations in the FPGA with a parallelized hardware implementation of the function. This has however not been done for this study. Table 3. Benchmarked performance of AMD G-Series SOC GX420CA with Cool n’ Quiet turned off running the described test suite with GPU acceleration. Further testing has been performed to find the upper limit for the current generation hardware. Table 1 shows benchmarking results using the fastest available AMD Gseries SOC eKabini family device, GX420CA SOC with Cool n Quiet turned off. The GX420CA supports CPU speeds of 2.0 GHz and a GPU clock at 600 MHz. This represents a 33% improvement in CPU speed and 20% improvement compared to the GX415GA. In theory this would represent a total 50% improvement in terms of computational performance. However, the measured increases are within 36% to 92% which cannot be fully accounted for by the pure speed increase. The reason is hidden in the performance gained by switching off the power saving features allowing the system to run uninterrupted. WarpPerspective WarpAffine GoodFeaturesToTrackDetector PyLKOpticalFlow Ubuntu OpenCV +OCL [FPS] 1064 1250 481 139 Improvem ent [%] 85 59 92 36 7. FUTURE WORK Future work will include performing detailed radiation analysis of system level single event upset (SEU) and single event latch-up (SEL) thresholds as well as extensive environmental testing. Future work will pursue testing including also heavy FPGA utilization to maximize the possible throughput. The Lucas-Kanade optical flow calculation is limited by the system bandwidth in terms of loading images in and out between the CPU and the GPU. There are two ways to improve the optical flow, first to enable use of the HSA architecture allowing the CPU and GPU to share the same memory and remove the need for copying the image and secondly to offload these Figure 9. Summary of the performed benchmarks run on the presented fault tolerant heterogeneous computer using four different image processing functions from the Open Computer Vision Library. Benchmarking is made on CPU only and with CPU + GPU acceleration using OpenCL (OCL) for both Ubuntu 14.04 and Windows 8.1. 7 Future work will also be performed to optimize kernel drivers to define and operate “system mutex” handling, thus allowing seamless operation of all three computing elements (CPU, GPU, FPGA) to work on the data same data without inherent knowledge of the data location using the HSA enabled architecture. Finally, new ways of simply in-orbit computing will be explored using tailored and standard software, especially focusing on applying in-orbit use of well-known scientific tools such as Matlab. REFERENCES [1] R. Ginosar, “Survey Space Processors”, Data Systems in Aerospace Conference (DASIA), 14-16 May 2012, Dubrovnik, Croatia. [2] T. Rajkowski et al, “Low Cost and High Performance On-board Computer for Picosatellite”, Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2012, edited by Ryszard S. Romaniuk, Proc. of SPIE Vol. 8454, 84540J · © 2012 SPIE, doi: 10.1117/12.200023. 8. CONCLUSIONS [3] It has been shown that a high performance heterogeneous fault and radiation tolerant computer can be realized on the industrial small form factor Qseven. Significant Input/Ouput extension has been enabled by the addition of an additional 120 pin board-to-board connection on the bottom side of the Qseven. http://www.cubesatshop.com/index.php?page=shop.produ ct_details&flypage=flypage.tpl&product_id=94&category _id=8&option=com_virtuemart&Itemid=75 (accessed, 2015-01-12) [4] C. Ahlberg et al, "GIMME - A General Image Multiview Manipulation Engine", Reconfigurable Computing and FPGAs (ReConFig), 2011 International Conference on, vol., no., pp.129,134, Nov. 30 2011-Dec. 2 2011, doi: 10.1109/ReConFig.2011.44 State-of-the-art processors for space performs 900 MFLOP while the proposed, developed, and tested heterogeneous architecture from COTS components performs at least 1510 MFLOP (151 GFLOP) at a comparable power consumption. [5] http://www.hsafoundation.com/, accessed 2015-01-12. There is a significant speed improvement using GPU accelerated image analysis using OpenCL and enables the demonstrated hardware to reach up to 1000 PAL resolution frames per seconds for certain functions. The platform has TFLOP range computational performance and can utilize the heterogeneous architecture fully using fast PCI Express enabled interconnects. [6] HSA Foundation, “HSA Platform System Architecture Specification 1.0 Provisional”, 2015. [7] Qseven standard, Standardization Group for Embedded Technologiges, http://www.sget.org/standards/qseven.html, accessed 2014-10-21. This increased capability, small industrial derived form factor, ad use of industry standard reliable embedded processor architectures and radiation tolerance methods, interfaces, and components provides a unique capability for realizing high performance, radiation-tolerant small satellite avionic systems. [8] COMexpress standard, PICMG Open Modular Standards, http://www.picmg.org/openstandards/com-express/, accessed 2014-10-21 [9] MXM standard, http://www.mxm-sig.org/, accessed 201410-21 [10] ECSS Q-ST-60 rev 2 standard, https://escies.org/download/webDocumentFile?id=60888 accessed 2014-10-22 [11] S. Habinc et al., “Using a Flash Based FPGA in a Miniaturized Motion Control Chip”, Sept. 1, Military and Aerospace Programmable Logic Devices (MAPLD) Conference 2009, Washington, USA. [12] Microsemi internal interim report on radiation testing of SmartFusion2 FPGA, http://www.microsemi.com/documentportal/doc_view/134103-igloo2-and-smartfusion2-65nmcommercial-flash-fpgas-interim-summary-of-radiationtest-results accessed 2014-10-21. 8 [13] NASA EPP Electronics Technology Workshop, “Advanced Micro Devices (AMD) Processor: Radiation Test Results”, June 11-12, 2013, NASA GSFC, Greenbelt, MD BIOGRAPHY Fredrik Bruhn received a Ph.D. in Microsystems Technologies from Uppsala University, Uppsala, Sweden in 2005 and a Masters of Science in Atomic and Molecular Physics from Uppsala University in 2000. He has been with Mälardalen University since 2013 as adjunct Professor in Robotics & Avionics. He has been a guest researcher at JPL and entrepreneur starting several high technology companies in robotics and space applications. He has been involved as senior designer in bi-lateral small satellite programs between NASA and the Swedish National Space Board and US Air Force Research Laboratory and the Swedish Defence Material Administration (FMV). [14] AMD G-series SOC, eKabini, http://www.amd.com/documents/amdgseriessocproductbr ief.pdf accessed 2014-10-22, accessed 2014-10-22 [15] ISCA 2014 | Heterogeneous System Architecture (HSA): Architecture and Algorithms Tutorial [16]PCI-SIG, PCI Express standard, https://www.pcisig.com/specifications/pciexpress/, accessed 2014-10-21 [17] LPC bus specification from Intel, http://www.intel.com/design/chipsets/industry/lpc.htm, accessed 2014-10-21 Kjell Brunberg received a M.Sc. in Theoretical Physics from Uppsala University in 1961. He has been with Hectronic AB for 20 years and is currently CEO of BAP and Upwis AB. He is a system engineer of industrial PC systems and internetof-things solutions with history of designing avionics hardware for UAV, fighter jets, marine communication systems. He is involved in three EU research programs and has been a board member of WISENET excellence center for wireless systems. [18] Mini-ITX standard, Intel Corporation, http://cachewww.intel.com/cd/00/00/47/97/479761_479761.pdf, accessed 2014-10-21 [19] OpenCV library development website, http://opencv.org/, accessed 2014-10-21. [20] OpenCV, WarpPerspective function. http://docs.opencv.org/modules/imgproc/doc/geometric_tr ansformations.html#warpperspective, accessed 2014-1021. [21] OpenCV, WarpAffine function. http://docs.opencv.org/modules/imgproc/doc/geometric_tr ansformations.html#warpaffine, accessed 2014-10-21. John Hines received a M.Sc. in Electrical Engineering from Stanford University in 1975 and a B.S. in Electrical Engineering from Tuskegee University in 1972. He has been with NASA Ames Research Center for 37 years in various capacities including the center’s Chief Technologist and Chief Technologist for the Small Spacecraft Division. During the time as the center’s Nanosatellite Mission Office manager he directed Biological nanosatellite missions and projects including PharmaSat, O/OREOS, GeneSat/GeneBox, PreSat, and Nanosail-D. [22] OpenCV, GoodFeaturesToTrack function. http://docs.opencv.org/modules/imgproc/doc/feature_dete ction.html#goodfeaturestotrack, accessed 2014-10-21. [23] OpenCV, CalcOpticalFlowPyrLK function. http://docs.opencv.org/modules/video/doc/motion_analysi s_and_object_tracking.html, accessed 2014-10-21. Lars Asplund received a PhD in Physics from Uppsala University 1977, a BSc in Physics from Uppsala University 1973. Professor in Computer Science at Mälardalen University since 1981. Now as emeritus. He has written ten textbooks in Electronics and Robotics, Achieved the degree of Docent in Physics at Uppsala University 1981. Has created two five-year engineering programs, one at 9 Uppsala University (IT) and one at Mälardalen University (Robotics). Magnus Norgren has been with BAP since 2013 and has studied engineering sciences at Uppsala University, Uppsala, Sweden. He is also part time research engineer at Mälardalen University. He has been involved in studies of safety critical multi-core implementations, cache coherency, heterogeneous system architecture definition, and chip selection trade-offs. 10