here - Bruhnspace

advertisement
Introducing Radiation Tolerant Heterogeneous Computers
for Small Satellites
Fredrik Bruhn
Mälardalen University, School
of Innovation, Design and
Engineering (IDT)
P.O Box 883, SE-721 23
Västerås, Sweden
+46 707833215
fredrik.bruhn@mdh.se
Kjell Brunberg
BAP
P.O. Box 3000, SE-753 30 Uppsala,
Sweden
kjell@adv.bruhnspace.com
Lars Asplund
Mälardalen University (IDT)
lars.asplund@mdh.se
Magnus Norgren
BAP
magnus@adv.bruhnspace.com
John Hines
Independent Consultant
548 Market Street #98125
San Francisco, CA 941045401, USA
+1 408 419-9735
johnhines555@gmail.com
shelf (COTS) components and hosted on industrial standard
form factor.
Abstract—This paper presents results and conclusions from
design, manufacturing, and benchmarking of a heterogeneous
computing low power fault tolerant computer, realized on an
industrial Qseven® small form factor (SFF) platform. A
heterogeneous computer in this context features multi-core
processors (CPU), a graphical processing unit (GPU), and a
field programmable gate array (FPGA). The x86 compatible
CPU enables the use of vast amounts of commonly available
software and operating systems, which can be used for space
and harsh environments.
A 2012 survey by Ramon Chips of processors for high
performance space missions space from 2012 describes the
still commonly used BAE RAD750 achieving 300 MIPS,
sporting 10.4 million transistors and costing in excess of
$200,000 [1]. Another example from the same paper
describes the higher end state-of-the-art Proton 200k single
board computer using a TI 320C64xx DSP sporting 900
MFLOP. These systems are expensive and of size not
compatible with the smallest satellites.
The developed heterogeneous computer shares the same core
architecture as game consoles such as Microsoft Xbox One and
Sony Playstation 4 and has an aggregated computational
performance in the TFLOP range. The processing power can
be used for on-board intelligent data processing and higher
degrees of autonomy in general. The module feature quad core
1.5 GHz 64 bit CPU (24 GFLOPs), 160 GPU shader cores (127
GFLOPs), and a 12 Mgate equivalent FPGA fabric with a
safety critical ARM® Cortex-M3 MCU.
By contrast, most small space missions using satellites in
the range of 1-10 kg today use low performance COTS
parts, simple microcontroller devices compared to industrial
advanced solutions for rugged computing. Devices such as 8
bit microcontrollers and lower end RISC architectures are
frequently used [2]. More advanced FPGAs with built in
ARM, PowerPC, Microblaze processors are also commonly
used which improves the performance significantly
compared to the lower end microcontrollers [3].
Earlier space use applications of x86 processors have not been
safety critical and were susceptible to radiation.
TABLE OF CONTENTS
Mälardalen Aerospace and Robotics Center (MARC) at
Mälardalen University have pursued reliable optimized
heterogeneous embedded computing architectures for
advanced vision systems for many years. Several
generations of demonstration hardware has been developed
starting with the General Image Multiview Manipulation
Engine (GIMME)-1, which featured an Intel Atom
processor and Xilinx Spartan FPGA [4]. The second
generation GIMME-2 is still under evaluation and features
dual core ARM Cortex A8 CPUs in a Xilinx FPGA and the
third generation and fully heterogeneous embedded
platform, GIMME-3, is presented in this paper.
1. INTRODUCTION .................................................1
2. FORM FACTOR ANALYSIS..................................2
3. ARCHITECTURE ................................................2
4. HARDWARE DESIGN ...........................................3
5. PERFORMANCE TESTING SETUP .......................5
6. RESULTS ............................................................6
7. FUTURE WORK ..................................................7
8. CONCLUSIONS ...................................................8
REFERENCES .........................................................8
We have chosen to partner with AMD and to explore the
emerging semiconductor industry new initiative called
Heterogeneous Computing under the framework
Heterogeneous System Architecture (HSA) driven by the
HSA Foundation [5, 6].
1. INTRODUCTION
This work presents the design, analysis, and benchmarking
of a new space environment capable heterogeneous
computing computer constructed with commercial-of-the978-1-4799-5380-6/15/$31.00 ©2015 IEEE
1
The HSA is industry driven to maximize the computational
performance in tightly integrated CPUs, GPUs, DSPs and
other programmable accelerators (FPGA) into a single
System on Chip (SoC) or Heterogeneous Computing
Module (HCM) combining SOC and FPGAs. However,
tight integration of these blocks is only one part of
unlocking the computational performance of massively
parallel systems, the second is to provide the infrastructure
with the means of giving the separate functions access to
shared memory and data. HSA is a model that presents these
features in a manner comprehensible to mainstream
software developers, and supported by their development
environments.
mm x 70 mm. The specified and standardized pinouts are
based on the high speed MXM system connector and are
vendor independent [9]. The ruggedized MXM connector
takes the I/O signals to and from the Qseven® module to the
carrier. This MXM connector is a well-known and proven
high speed signal interface connector that is commonly used
for high speed PCI Express graphics cards in notebooks.
COM Express® defines standardized form factors and pinouts for Computer-on-Modules. The standard includes the
mini form factor (84 x 55mm), the compact form factor i.e.,
type 2 and 6 (95 mm x 95 mm) and the basic form factor
(125 mm x 95 mm). COM Express® is unique in that it may
be used in two ways:
One important aspect for all modern processors is memory
coherency which has been taken for granted in
homogeneous multiprocessor and multi-core systems for
decades, but allowing heterogeneous processors with CPU,
GPU, DSP, FPGA to maintain coherency in a shared
memory environment is a revolutionary concept. HSA
unlocks the power of coherency between heterogeneous
processors and removes the need of copy operations by
queuing up pointers and introducing Cache Coherent Shared
Virtual Memory (CC-SVM) and is hence very attractive for
a high-performance radiation tolerant processor.
1.
As a standalone single board computer; and/or
2.
As a processor mezzanine that can be plugged onto
a base board, or “carrier” board, that contains the
user’s application specific I/O.
The Qseven® has the smallest stacking height of the two
concepts while the IO is almost identical with only minor
differences depending on standard revision. The smaller
stacking height and the interpretation by the authors that the
small form factor (SFF) market tendency for new designs is
toward Qseven® implementation it was selected as the
candidate for the heterogeneous radiation tolerant computer.
One of the most troublesome problems, apart from the
complex software associated with heterogeneous
architectures is the communication between the various
processing parts. For a system where the partitioning can be
done with low data rates between the computational units
the delays is not important. The other extreme is when
GPUs and FPGAs have to be heavily interleaved with
computation on the CPUs. This case requires a fast intercommunication link to hand off data fast.
3. ARCHITECTURE
Radiation tolerance is a significant driver for the
architectural design, and affects the selection of
components, electrical wiring, and de-rating of components.
A vital part of radiation tolerance is single-event upset
(SEU) mitigation through techniques such as multiple bit
error detection and correction (EDAC or ECC). Hence it is
important that all, or at least most of the design supports
EDAC.
A heterogeneous computing computer in this paper is a
computer design featuring a combination of multi-core
CPU, graphical processing unit (GPU), and field
programmable gate array (FPGA).
All passive components are de-rated according to the
guidelines set forth by the European Cooperation for Space
Standardization (ECSS) Space Product Assurance standard
for Electrical, Electronic, and Electromechanical (EEE)
components [10].
2. FORM FACTOR ANALYSIS
Different industrial standard form factors have been
analyzed with respect of modularity, input/output (IO)
pinning robustness, expandability, and heat sink
applicability. The selected standards to evaluate were the
Standardization Group for Embedded Technologies driven
Qseven® [7], and the PICMG group COMexpress® type 2
and type 6 [8]. These are all concepts that are industry
ready, off-the-shelf, multi vendor, Computer-On-Modules
that integrates all the core components of a common PC.
Both the Qseven® and COMexpress® modules provide the
functional requirements for an embedded application. These
functions include, but are not limited to, graphics, sound,
mass storage, network and multiple USB ports.
The FPGA element of the designed heterogeneous
computing module was selected based on known heritage
and price performance. An initial starting reference was the
Microsemi ProAsic3 FPGA which has been shown to
perform well in space environment in industrial, military,
and space packaging [11]. However, this circuit has end-oflife (EOL) and was not deemed suitable for a new design.
Based on heritage from the ProASIC3 and the assumption
that Microsemi would keep the design principles on the new
SmartFusion2 FPGA and early radiation reports, the
SmartFusion2 was selected as the candidate of choice for
the hardware implementation [12]. Also of importance is the
fact the SmartFusion2 FPGA contains an ARM Cortex-M3
Qseven® modules are mounted onto an application specific
carrier board and have a standardized form factor of 70 mm
x 70 mm with an alternate IO expansion board measuring 40
2
microcontroller which can be used as an advanced system
watchdog and recovery processor.
The redundant data paths are provided by the Low Pin
Count bus (LPC) which is a simplified PCI interface for
legacy ISA bus devices [17]. LPC bus uses 4 data pins and a
33 MHz clock and can hence transmit approximately 100
Mbps.
Selection of a suitable high performance CPU and GPU
proved to more challenging. Based on radiation reports from
NASA on CPUs from Advanced Micro Devices (AMD)
attention was drawn to their offerings. 17 Mrad has been
reported for the AMD Liano processor core on a 32 nm
silicon-on-insulator (SOI) using hi-Kmetal gates (HKMGs)
[13]. A good candidate manufactured on a very similar
process was found in the AMD Embedded System-on-Chip
(SOC) G-series based on the “Jaguar” architecture
(eKabini). The benefit of the AMD G-series is the pinupgradeable eKaveri “Steppe Eagle” revision which has
partial HSA support. The G-series chips however are based
on 28 nm using the same strained silicon process.
The benefit of the AMD SOC is that it include multi-core
x86 compatible CPU and GPU in the same physical die as
well as a wide range of familiar PC IO such as PCI Express,
SATA, USB 2, 3 etc. [14]. In addition the SOC includes
new features for enhanced Universal Video Decode (UVD)
and Video Encode (VCE) hardware acceleration and
enhanced clock gating and C6 ‘deep power down’
capabilities that lower overall power consumption. In terms
of raw computational performance the embedded GPU
performs up to 256 GFLOPS per clock compute power and
supports OpenGL™ 4.2 and OpenCL™ 1.2 full profile.
AMD supports Heterogeneous System Architecture (HSA)
in the G-series SOC eKaveri versions [15]. This allow
amongst many things important features such as unified
addressing across all processors, virtual memory coherency,
high level language support for GPU compute processors,
and HSA intermediate language (HSAIL). These functions
help speed up the CPU and GPU interaction and are
leveraged in combination with the FPGA in a heterogeneous
computer module.
Figure 1 illustrates a block diagram of the developed
heterogeneous computing module, including the redundant
pathways for creating the heterogeneous system by
connecting the CPU/GPU and the FPGA with its micro
processing unit (ARM Cortex-M3).
4 x CPUs
LPC
AMD SOC
MCU
FPGA
PCIE
GPU
0.5 GB RAM
2 GB RAM
Qseven
Figure 1. GIMME3 core Architecture
4. HARDWARE DESIGN
The Qseven® standard provides 230 pins in the specified
MXM connector for standard PC features such as HDMI,
Displayport, SATA, PCI Express, CAN, Power etc. [7].
Furthermore, the standard requires a board voltage supply of
5 V which is compatible with many small satellites.
However there are no flexible IO pins suitable for FPGA
connection in the MXM which is a significant problem
since a natural extension of the FPGA usage is for IO
translation in addition to acting as a co-processor to the
CPU/GPU. In order to harness the full use of the FPGA
additional IO capability must be added to the Qseven® or
alternatively breaking the standard by changing the IO
definition in the MXM connector. The later choice would
cause the new heterogeneous module to be non-compliant to
the installed systems already on the market and is not
preferable. As a consequence, this enhanced Qseven®
module was equipped with a 120 pin extension IO
capability through a total mated 5 mm board-to-board
connector on the bottom side of the Qseven® as seen in
Figure 2.
An optimized heterogeneous computer module must support
a fast link between all integral computing parts. In the
selected parts described above, both the AMD SOC and the
FPGA supports PCI Express generation 2.0 with 5 Giga
Transfers per second (GT/s) per lane. In order to support
redundant data pathways at least two different
communication channels should be used. The core design
uses one lane PCI Express gen 2.0 which provides 5 GT/s or
an equivalent of approximately 4 Gbps considering the 8/10
encoding applied on data transfers over PCI Express [16].
3
Standard Qseven
MXM connection
Figure 3 shows a photograph of the top side of a
commercialized heterogeneous Qseven® module from the
Swedish company BAP AB, which is compatible with the
architecture described in this paper. On the right hand side
the AMD SOC is seen together with DDR3 memory. On the
left side is the Microsemi FPGA together with 0.5 GB of
DDR3 memory. Both the SOC and the FPGA memory have
implemented support of error correction and the total
amount is 2 GB for the SOC and 0.5 GB for the FPGA.
New 120 pin boardto-board connection
Figure 2. Illustration of an enhanced Qseven® module with an
extra board-to-board connector mounted on the bottom side,
outside the mechanical cooling area.
A summary of the enhanced Qseven® features are listed in
Table 2.
Table 2. Summary of the Enhanced Heterogeneous
Qseven®
Using the extra 120 IO signals provided by the board-toboard connector, it is possible to carry over multiple FPGA
signals to the user carrier board. A summary of the signals is
shown in Table 1.
Form factor
Power input
Table 1. Signals in 120-pin board-to-board extension
2 x SERDES
16 x LVDS/GPIO
8 x DDRIO/GPIO
1 x1 PCI Express
2 x I2C / GPIO
2 x SPI / GPIO
2 x UART / GPIO
FPGA JTAG
1 x ULPI
1 x Power good
1 x FPGA reset
1 x 12 bit ADC
Ground
IO connectors
CPU/SOC
Serializer-Deserializer, 5 Gbps
Differential, LVDS signal level
Differential, 2.5/3.3 V signal level
PCI Express generation 2
I2C bus or GPIO
SPI Bus or GPIO
Serial communication or GPIO
Debugging interface
USB v2 FPGA interface
FPGA management
FPGA management
Analog signal input
Ground
GPU/SOC
FPGA
DRAM SOC
DRAM FPGA
Ethernet
SOC-FPGA
interconnect
SOC
IO/interfaces
With the flexible GPIO it is possible to implement
commonly available bus protocols not supported natively by
the AMD or the FPGA such as RapidIO, Profibus, Modbus,
1553, SpaceWire etc.
FPGA
interfaces
IO
Qseven® enhanced (+)
5 V SOC power net
5 V FPGA power net
MXM-230 + 120 extension
AMD Embedded G series 415GA
Quad Core 1.5 GHz, 2 MB L2 cache
AMD Radeon HD 8330E
Microsemi SmartFusion2 M2S050-T
2 GB DDR3 with ECC
0.5 GB DDR3 with ECC
1 Gigabit Ethernet
PCI Express x1 generation 2.0
LPC bus
2 x1 PCI Express lanes
1 x4 PCI Express lanes
2 x USB 3.0
4 x USB 2.0
2 x SATA version 3.0
LPC bus
SM-bus
I2C bus
SDIO/MMC
AMD Debugging
2 x SERDES
24 x differential/single ended IO (16 with
LVDS)
2 x I2C / GPIO
2 x SPI / GPIO
2 x UART / GPIO
1 x Controller Area Network (CAN 2.0b)
1 x ULPI
JTAG
Figure 4 shows a photograph of the bottom side of the
module shown in Figure 3 with the additional board-toboard connector clearly visible on the top right side. In total
the number of IO from the module is 350.
Figure 3. Photograph showing the top side of the developed
enhanced heterogeneous Qseven® (70 x 70 mm2 module).
4
mini-ITX carrier supports dual gigabit Ethernet interfaces to
the AMD SOC, one gigabit Ethernet interface with Powerover-Ethernet 803.02at support to the FPGA, 2 x USB 3.0, 4
x USB 2.0, 2 x SATA v3.0, SDIO, HDMI, PCI Express x4,
COM ports, sound input/output, a FPGA development
extension area etc.
5. PERFORMANCE TESTING SETUP
Testing of the performance is difficult as there are many
different tests and benchmarks, as well as many different
methods of defining performance. In this case it was
decided to base the benchmark on two default non-tweaked
operating systems Ubuntu Linux 14.04.1 LTS 64 bit desktop
version and, Microsoft Windows 8.1 OEM 64 bit and image
processing algorithms. In the case of Ubuntu the enterprise
error correction features of the AMD SOC was enabled. No
optimizations have been made for embedded use for either
operating system. The enhanced Qseven® device under test
(DUT) sample supplied by BAP used the AMD G-series
SOC “GX415GA” featuring quad core 1.5 GHz CPUs,
Radeon HD 8330E GPU core clocked at 500 MHz, and a
SmartFusion 2 M2S050T FPGA. The theoretical floating
point performance of the CPU set is 24 GFLOPS and GPU
floating point performance is 127 GFLOPS. The AMD
memory configuration was set to run at DDR3-1333 which
is equivalent to 85.6 Gbps.
Figure 4. Photograph showing the bottom side of the developed
enhanced heterogeneous Qseven® module. The extra IO
expansion capability board-to-board connector can be seen in
the upper right.
Qseven® modules require a carrier board to be functional.
However in this case it is not possible to verify all
functionality by using an off-the-shelf Qseven® compatible
carrier since it will lack support for the 120 pin extension. In
order to verify the design and perform benchmarking a
mini-ITX compatible form factor carrier was developed
which can exercise all features and provide a simple
development environment [18].
The following drivers were used for testing:


For Windows 8.1, AMD Catalyst Driver 14.8
For Ubuntu, AMD Catalyst Driver 14.8
A commonly used software suite to stress test CPU/GPUs is
the Open Computer Vision Library (OpenCV) [19].
OpenCV is an open-source BSD-licensed library that
includes several hundreds of computer vision algorithms.
OpenCV version 2.4.8 was used in Windows 8.1 and
version 2.4.9 was used in Ubuntu. Both versions were
compiled with enabled instruction optimizations for the
AMD Jaguar platform. These optimizations include
Streaming SIMD extentions (SSE, SSE2, SSE3, SSE4.1,
SSE4.2, SSE 4A), advanced vector extensions (AVE),
Multimedia extensions (MMX, EMMX), AMD64, and fast
float save and restore (FXSR). OpenCL support for GPU
acceleration was turned on (which is default in 2.4
OpenCV). OpenCV for Windows was compiled with Visual
Studio Express 2013, and for Ubuntu the standard GCC.
Benchmarking was done using a subset of OpenCV
functions from three different groups; Geometric Image
Transformations, Feature Detection, and Motion Analysis
and Object Tracking group.
Figure 5. Photograph showing the associated mini-ITX
compatible development board for the heterogeneous Qseven
module.
The following features are used for benchmarking:
Figure 5 shows a photograph of the associated mini-ITX
reference carrier with a mounted enhanced heterogeneous
Qseven® covered with a heat sink and a cooling fan. The
From Geometric Image Transformations group
5

WarpPerspective,
Applies
a
transformation to an image, [20]

WarpAffine, Applies an affine transformation to an
image [21].
perspective
From Feature Detection group

GoodFeaturesToTrack,
GoodFeaturesToTrackDetector_OCL, Determines
strong corners on an image [22].
From Motion Analysis and Object Tracking group

CalcOpticalFlowPyrLK,
PyrLKOpticalFlow,
Calculates an optical flow for a sparse feature set
using the iterative Lucas-Kanade method with
pyramids [23].
Figure 7. Screen shot from task manager in Windows 8.1
showing the load at minimal load.
A test image in PAL resolution (720 x 576 pixels) was used
in the benchmark. Figure 6 shows the test image.
Figure 8. Screen shot from system manager in Ubuntu 14.04
showing the load at nominal desktop load.
Figure 6. Test image for benchmarking (PAL resolution
720x576 pixels).
6. RESULTS
Testing using the GX415GA module was using AMD Cool
n Quiet technology.
Figure 9 presents a summary of run benchmarks on the
enhanced Qseven® DUT. Values are presented in frames
per second (FPS) for both Ubuntu 14.04 and Windows® 8.1
running on only CPU and with hardware acceleration using
OpenCL (OCL). The first and primary observation is that
the platform is stable on load and successfully runs the latest
available code successfully. Secondly it is clear that the
GPU acceleration makes a very big difference in case of
Ubuntu. The reason for this is unclear and must be further
investigated. The DUT in this case has Cool n’ Quiet power
saving enabled which limits the performance but still shows
a maximum 787 frames per second for WarpAffine test and
102 frames per second for the complex Pyramid Optical
Flow test. At PAL resolution 787 fps and a black and white
test image with 8 bit pixel color depth corresponds to a
continuous data flow of approximately 2.5 Gbps which is
continuously evaluated and sustained in this test. The
Figure 7 and 8 shows screen shots from Windows® 8.1 and
Ubuntu 14.04 running on the DUT with minimal or nominal
load. Figure 7 reveals that the clock frequency scaling is
working with 0.9 GHz average use compared to nominal 1.5
GHz and that 512 MB is devoted to GPU memory leaving
1.5 GB RAM for CPU.
6
approximate power consumption during testing was 6 W.
It is clear that even with only CPU/GPU power new
capabilities in terms of advanced on-board data processing
can be achieved.
calculations in the FPGA with a parallelized hardware
implementation of the function. This has however not been
done for this study.
Table 3. Benchmarked performance of AMD G-Series SOC
GX420CA with Cool n’ Quiet turned off running the described
test suite with GPU acceleration.
Further testing has been performed to find the upper limit
for the current generation hardware. Table 1 shows
benchmarking results using the fastest available AMD Gseries SOC eKabini family device, GX420CA SOC with
Cool n Quiet turned off. The GX420CA supports CPU
speeds of 2.0 GHz and a GPU clock at 600 MHz. This
represents a 33% improvement in CPU speed and 20%
improvement compared to the GX415GA. In theory this
would represent a total 50% improvement in terms of
computational performance. However, the measured
increases are within 36% to 92% which cannot be fully
accounted for by the pure speed increase. The reason is
hidden in the performance gained by switching off the
power saving features allowing the system to run
uninterrupted.
WarpPerspective
WarpAffine
GoodFeaturesToTrackDetector
PyLKOpticalFlow
Ubuntu
OpenCV
+OCL
[FPS]
1064
1250
481
139
Improvem
ent [%]
85
59
92
36
7. FUTURE WORK
Future work will include performing detailed radiation
analysis of system level single event upset (SEU) and single
event latch-up (SEL) thresholds as well as extensive
environmental testing.
Future work will pursue testing including also heavy FPGA
utilization to maximize the possible throughput.
The Lucas-Kanade optical flow calculation is limited by the
system bandwidth in terms of loading images in and out
between the CPU and the GPU.
There are two ways to improve the optical flow, first to
enable use of the HSA architecture allowing the CPU and
GPU to share the same memory and remove the need for
copying the image and secondly to offload these
Figure 9. Summary of the performed benchmarks run on the presented fault tolerant heterogeneous computer using four different
image processing functions from the Open Computer Vision Library. Benchmarking is made on CPU only and with CPU + GPU
acceleration using OpenCL (OCL) for both Ubuntu 14.04 and Windows 8.1.
7
Future work will also be performed to optimize kernel
drivers to define and operate “system mutex” handling, thus
allowing seamless operation of all three computing elements
(CPU, GPU, FPGA) to work on the data same data without
inherent knowledge of the data location using the HSA
enabled architecture.
Finally, new ways of simply in-orbit computing will be
explored using tailored and standard software, especially
focusing on applying in-orbit use of well-known scientific
tools such as Matlab.
REFERENCES
[1] R. Ginosar, “Survey Space Processors”, Data Systems in
Aerospace Conference (DASIA), 14-16 May 2012,
Dubrovnik, Croatia.
[2] T. Rajkowski et al, “Low Cost and High Performance
On-board Computer for Picosatellite”, Photonics
Applications in Astronomy, Communications, Industry,
and High-Energy Physics Experiments 2012, edited by
Ryszard S. Romaniuk, Proc. of SPIE Vol. 8454, 84540J ·
© 2012 SPIE, doi: 10.1117/12.200023.
8. CONCLUSIONS
[3]
It has been shown that a high performance heterogeneous
fault and radiation tolerant computer can be realized on the
industrial small form factor Qseven. Significant Input/Ouput
extension has been enabled by the addition of an additional
120 pin board-to-board connection on the bottom side of the
Qseven.
http://www.cubesatshop.com/index.php?page=shop.produ
ct_details&flypage=flypage.tpl&product_id=94&category
_id=8&option=com_virtuemart&Itemid=75
(accessed,
2015-01-12)
[4] C. Ahlberg et al, "GIMME - A General Image Multiview
Manipulation Engine", Reconfigurable Computing and
FPGAs (ReConFig), 2011 International Conference on,
vol., no., pp.129,134, Nov. 30 2011-Dec. 2 2011, doi:
10.1109/ReConFig.2011.44
State-of-the-art processors for space performs 900 MFLOP
while the proposed, developed, and tested heterogeneous
architecture from COTS components performs at least 1510
MFLOP (151 GFLOP) at a comparable power consumption.
[5] http://www.hsafoundation.com/, accessed 2015-01-12.
There is a significant speed improvement using GPU
accelerated image analysis using OpenCL and enables the
demonstrated hardware to reach up to 1000 PAL resolution
frames per seconds for certain functions. The platform has
TFLOP range computational performance and can utilize
the heterogeneous architecture fully using fast PCI Express
enabled interconnects.
[6] HSA Foundation, “HSA Platform System Architecture
Specification 1.0 Provisional”, 2015.
[7] Qseven standard, Standardization Group for Embedded
Technologiges,
http://www.sget.org/standards/qseven.html, accessed
2014-10-21.
This increased capability, small industrial derived form
factor, ad use of industry standard reliable embedded
processor architectures and radiation tolerance methods,
interfaces, and components provides a unique capability for
realizing high performance, radiation-tolerant small satellite
avionic systems.
[8] COMexpress standard, PICMG Open Modular Standards,
http://www.picmg.org/openstandards/com-express/,
accessed 2014-10-21
[9] MXM standard, http://www.mxm-sig.org/, accessed 201410-21
[10] ECSS Q-ST-60 rev 2 standard,
https://escies.org/download/webDocumentFile?id=60888
accessed 2014-10-22
[11] S. Habinc et al., “Using a Flash Based FPGA in a
Miniaturized Motion Control Chip”, Sept. 1, Military and
Aerospace Programmable Logic Devices (MAPLD)
Conference 2009, Washington, USA.
[12] Microsemi internal interim report on radiation testing of
SmartFusion2 FPGA,
http://www.microsemi.com/documentportal/doc_view/134103-igloo2-and-smartfusion2-65nmcommercial-flash-fpgas-interim-summary-of-radiationtest-results accessed 2014-10-21.
8
[13] NASA EPP Electronics Technology Workshop,
“Advanced Micro Devices (AMD) Processor: Radiation
Test Results”, June 11-12, 2013, NASA GSFC,
Greenbelt, MD
BIOGRAPHY
Fredrik Bruhn received a Ph.D. in
Microsystems Technologies from
Uppsala
University,
Uppsala,
Sweden in 2005 and a Masters of
Science in Atomic and Molecular
Physics from Uppsala University in
2000. He has been with Mälardalen
University since 2013 as adjunct
Professor in Robotics & Avionics.
He has been a guest researcher at JPL and entrepreneur
starting several high technology companies in robotics
and space applications. He has been involved as senior
designer in bi-lateral small satellite programs between
NASA and the Swedish National Space Board and US Air
Force Research Laboratory and the Swedish Defence
Material Administration (FMV).
[14] AMD G-series SOC, eKabini,
http://www.amd.com/documents/amdgseriessocproductbr
ief.pdf accessed 2014-10-22, accessed 2014-10-22
[15] ISCA 2014 | Heterogeneous System Architecture (HSA):
Architecture and Algorithms Tutorial
[16]PCI-SIG,
PCI
Express
standard,
https://www.pcisig.com/specifications/pciexpress/,
accessed 2014-10-21
[17] LPC bus specification from Intel,
http://www.intel.com/design/chipsets/industry/lpc.htm,
accessed 2014-10-21
Kjell Brunberg received a M.Sc. in
Theoretical Physics from Uppsala
University in 1961. He has been
with Hectronic AB for 20 years and
is currently CEO of BAP and Upwis
AB. He is a system engineer of
industrial PC systems and internetof-things solutions with history of
designing avionics hardware for
UAV, fighter jets, marine communication systems. He is
involved in three EU research programs and has been a
board member of WISENET excellence center for
wireless systems.
[18] Mini-ITX standard, Intel Corporation, http://cachewww.intel.com/cd/00/00/47/97/479761_479761.pdf,
accessed 2014-10-21
[19] OpenCV library development website,
http://opencv.org/, accessed 2014-10-21.
[20] OpenCV, WarpPerspective function.
http://docs.opencv.org/modules/imgproc/doc/geometric_tr
ansformations.html#warpperspective, accessed 2014-1021.
[21] OpenCV, WarpAffine function.
http://docs.opencv.org/modules/imgproc/doc/geometric_tr
ansformations.html#warpaffine, accessed 2014-10-21.
John Hines received a M.Sc. in
Electrical
Engineering
from
Stanford University in 1975 and a
B.S. in Electrical Engineering from
Tuskegee University in 1972. He has
been with NASA Ames Research
Center for 37 years in various
capacities including the center’s
Chief Technologist and Chief
Technologist for the Small Spacecraft Division. During
the time as the center’s Nanosatellite Mission Office
manager he directed Biological nanosatellite missions
and projects including PharmaSat, O/OREOS,
GeneSat/GeneBox, PreSat, and Nanosail-D.
[22] OpenCV, GoodFeaturesToTrack function.
http://docs.opencv.org/modules/imgproc/doc/feature_dete
ction.html#goodfeaturestotrack, accessed 2014-10-21.
[23] OpenCV, CalcOpticalFlowPyrLK function.
http://docs.opencv.org/modules/video/doc/motion_analysi
s_and_object_tracking.html, accessed 2014-10-21.
Lars Asplund received a PhD in
Physics from Uppsala University
1977, a BSc in Physics from
Uppsala University 1973. Professor
in Computer Science at Mälardalen
University since 1981. Now as
emeritus. He has written ten
textbooks in Electronics and Robotics, Achieved the
degree of Docent in Physics at Uppsala University 1981.
Has created two five-year engineering programs, one at
9
Uppsala University (IT) and one at Mälardalen
University (Robotics).
Magnus Norgren has been with BAP
since 2013 and has studied
engineering sciences at Uppsala
University, Uppsala, Sweden. He is
also part time research engineer at
Mälardalen University. He has been
involved in studies of safety critical
multi-core implementations, cache
coherency, heterogeneous system
architecture definition, and chip selection trade-offs.
10
Download