st
Eric M. Dashofy
Computer Systems Research Department
The Aerospace Corporation
CSRD/CSTS/ETG
March 8, 2011
© The Aerospace Corporation 2011
1
2
I’ll give you the answer up front…
•
It comes in many different sizes:
– Small
– Medium
– Large
– Super-Size Me!
•
It’s cheap and it doesn’t last long
•
Nearly any alternative is healthier
3
Outline
•
I’ll talk about three areas of interest at different scales
– Small: Wireless sensor networks
– Medium: Tiny embedded computers, clustered
– Large: High-performance and cloud computing
•
And identify some unique challenges in each area
•
And describe some Aerospace research or investigations in each one
High-Performance Computing
© The Aerospace Corporation 2011
5
High-Performance Computing
• Critical domain to work of The Aerospace Corporation
• Solving computationally-intensive problems, generally with large data sets
• Sometimes data sets are from “real” sensor sources:
– Signal processing, image processing
• Sometimes they are synthesized
– e.g., find optimal satellite configuration through exhaustive search or Monte-Carlo method
• …using a variety of different, interconnected computing architectures, techniques, and resources beyond a single core/serial implementation.
• HPC-style resources are everywhere: laptops, desktops, gaming consoles, embedded processors…
6
Major trends in HPC
• Increasing the speed of individual processing elements
– i.e., more megahertz!
• Increasing the “smarts” within processors
– i.e., longer pipelines, branch prediction, speculative execution
• Parallelism is the name of the game: dwarfs megahertz and smarts
At the level of…
Instructions
Data elements
Threads
Through…
Multi-issue processors
Vector processing, SIMD instructions
“Hyperthreading”
Processor elements Multicore processors, Heterogeneous Multicore
Processors Symmetric Multiprocessing (SMP), NUMA
Machines
Clusters
Cluster-based computing
Grid computing
Increasing Complexity in HPC
• Put another way…
Chip Multiissue
486
Pentium X
PII/III X
P4 X
Core… X
GPUs
Power6 X
Itanium
Cell
SIMD
Inst.
X
X
X
X
X
X/X
SIMD
Units
X
SMT
X
X
X/_
7
Not shown: FPGAs, tilebased computing…
Multicore
X
X
X
X/X
VLIW/E
PIC
X
X
X
X
X
X
X
X
X
Multi-
Proc.
X
H.M.P.
X
8
A Challenge Problem
•
What is the fastest way to transform a large amount of data through three transformations (“Red”, “Green”, and “Blue”)?
– I’ll give you two options (assume both will result in the same output)
Data
Data
9
A Challenge Problem: Option 1
• Do all the red, then all the green, then all the blue.
Data
10
A Challenge Problem: Option 2
• Chunk up the data and do red, then green, then blue on each chunk.
Data
Ordinary parallelism is only one side of the story…
• Moore’s Law: The number of transistors on a chip will double every
~18 months
Credit: User Wgsimon; used under Creative
Commons ShareAlike 3.0 License
11
…but what are we doing with those transistors?
• Moore’s law: the number of transistors on a chip will double every
~18 months
Multicore + SMT
Multicore VLIW/EPIC
VLIW/EPIC
SMT, multi-issue
GPUs
Multicore
Heterogeneous Multiprocessor
SMT, in-order
Superscalar, multi-issue
12
13
Key Challenges
•
Parallelism is still hard for people to understand and master
– Some techniques – OpenMP, actor model – ease things
•
Trying out different parallelization strategies is still expensive and labor-intensive
•
Good tooling is hard to get, because the platforms are evolving faster than the tools can be built
14
A Little Research
Kinda-Embedded Computing
© The Aerospace Corporation 2011
“Kinda-Embedded” Computing
16
•
New computing platforms and processors emerging in a niche we haven’t seen before
– The power, programmability, and connectivity of an (old) desktop computer
– In extremely small form factors
•
Largely driven by developments in cell phone technology
•
Meet Gumstix Overo
•
600Mhz ARM-based processor w/extensions
•
256MB RAM, 256MB Flash, plus
SD card (up to 8GB)
•
10/100 Ethernet, 54mbps WiFi,
Bluetooth
•
USB, HDMI
17
Applications
•
Cell phones, tablets, PDAs, other handhelds
•
“Plug Computers”
•
Car-puters
•
Walltops
•
Location-aware applications
•
Situational Awareness
•
Micro web-servers
•
Spaceborne(?)
18
Research: Gumstix for HPC-style Processing
•
Worked with a UCSB student team and a summer intern
•
Developed two clusters
– One homemade
– One on Gumstix “Stagecoach” backplane
•
Ported Range-Doppler SAR algorithm to Gumstix and parallelized it
•
Compared performance and power usage to a small-formfactor desktop (Mac Mini)
19
20
15
10
5
Performance of various implementations
SAR Image Processing Time
Comparing Different Compilation Tools with Several
Computers and Platforms
40
37,04
35
30
25
24,51
20,39
16,79
14,37
13,64
18,32
13,25
17,09
13,07
16,29
12,93
15,88
12,85
Native FFTW Compilation
BitBake FFTW Compilation
FFMPEG Compilation
Mac Mini
5,732
6,288 6,47 6,558 6,656 6,718 6,75
0
1
0,422
2 3 4 5
Number of Computers
6 7
Energy Use
Energy Cost of Processing SAR Data Using Several Computers
160,0
142
140,0
120
120,0
105
100,0
88,5
75,7
80,0
79,3
62,3
60,0
65,2
45,9
40,0
49,2
36,9
20,0
0,0
12,6
11,5
1
24,5
2 3 4
Number of Computers
5 6
Gumstix extremely close to Mac Mini in terms of power consumption
20
101
7
Gumstix Energy Cost
(Computation Only)
Gumstix Energy Cost
(Including Cluster Hardware)
Mac Mini
21
Key Challenges
•
Optimization can make performance vary widely
– Software engineering gives us few tools to do that optimization
Biggest performance gains at very low levels of abstraction
•
Likely to see increasing diversity in processors at this scale
– Will what we learn on one apply to another?
•
True mobility requires batteries
– Power-aware computing probably increasingly important
– But the lure of these platforms is how similar they are to platforms where we can blithely ignore power use.
Wireless Sensor Networks
© The Aerospace Corporation 2011
23
Wireless Sensor Networks
• A wireless network of physically distributed small computing devices
(colloquially: “motes”) equipped with tiny sensors
• Additional characteristics:
– Network topology may be fixed, variable, or ad-hoc
• Motes may be mobile, though infrequently
– Devices expected to run for months or years unattended
– Data collected generally forwarded to a central collection point
– Network should survive the loss of numerous motes
• Applications:
– Environmental monitoring
– Factory/industrial monitoring
– Target detection and tracking
Meet Mica
• MicaZ Mote
– 8Mhz Atmel ATMega128 Microcontroller (MIPS-like assembly language)
– 4KB of RAM
– 512KB of Flash (usually partitioned into 4x128KB program blocks)
– CC2420 802.15.4 “Zigbee”-compliant radio
• 7 power usage modes
– Several analog inputs, several digital inputs, I2C
24
• Sensor boards stack on top
– Light, temperature, humidity, orientation (magnetometer), acceleration, location (GPS), microphone (levels only)
Crossbow MicaZ Mote
25
How Motes Work
• Drop a bunch of motes in an area
• Distance between them: 50-100 meters
• One distinguished mote is the “gateway;” this is connected to an ordinary PC by a serial connection or Ethernet transceiver for data collection
GW
Base Station PC
26
How Motes Work
• Drop a bunch of motes in an area
• Distance between them: 50-100 meters
• One distinguished mote is the “gateway;” this is connected to an ordinary PC by a serial connection or Ethernet transceiver for data collection
GW
• Software on the motes allows them to self-organize, collect data, and report that data to their ‘parent’ in the network
– Base station ultimately gets all data
27
A sample application we developed
• Signal transmitter tracking application
28
A sample application we developed
• Cell-phone tracking application
29
The Software Stacks
User App
Socket
XServe
Serial USB Ethernet
XServe receives data from base station mote and makes it available to userspace apps
XMesh forms and maintains ad-hoc network; also provides interface for other TinyOS components to send and receive messages.
Motes communicate over 802.15.4 protocol
(used in ZigBee devices)
Long-range, low data rate.
TinyOS
Mote software based on open-source
TinyOS.
A thin component model is built on top of that.
30
Software on the Motes
•
•
•
• TinyOS developed concurrently with the Mica family motes
– Mica motes still probably the #1 platform for TinyOS
– Other platforms supported
Amazingly, there is a thin but useful component model configuration Blink {
} implementation { components Main, BlinkM, SingleTimer, LedsC;
Components written in nesC (a dialect of C) with well-defined interfaces
}
Main.StdControl -> BlinkM.StdControl;
Main.StdControl -> SingleTimer.StdControl;
BlinkM.Timer -> SingleTimer.Timer;
BlinkM.Leds -> LedsC;
…
Interfaces connected through a configuration diagram that resembles early module interconnection languages (e.g., Polylith)
} interface Timer { command result_t start(char type, uint32_t interval); command result_t stop(); event result_t fired();
31
Key Challenges
• Extreme power management is key
– How can we better integrate higher-level power management principles into the sensor architecture?
• Control flow is low-level and painful
– Can we alleviate this with higherlevel models that are “compiled down” to implementations without sacrificing power?
• Are there architectural styles appropriate for these applications?
– How do you build applications that use computation and aggregation in-the-mesh to reduce data transmission?
• How do you build debuggable systems?
• How do you transition to next-generation systems?
32
Conclusion
•
We are faced with yetanother “Cambrian Explosion” of
– Scales
– Platforms
•
Software engineering knowledge and insight lacking in these domains
– For the smaller domains, abstraction (the key to software engineering) is the enemy of performance and battery life
•
We are building platforms faster than we can build tools and far faster than we can build skills
– Domain experts often have little formal training in software engineering
•
Need lightweight “force multipliers” with a very low cost:benefit ratio
All trademarks, service marks, and trade names are the property of their respective owners.