Case studies: Seti@home

advertisement
SETI@Home
Sunny Gleason
COM S 717
November 29, 2001
(Based on the article, “SETI@Home: Massively Distributed
Computing for SETI.”)
In This Presentation
•
•
•
•
•
What is SETI?
Partitioning the Job
The SETI@Home Client
Server Post-processing
Project Status
SETI@Home
• SETI: Search for Extra-Terrestrial Intelligence
–
–
–
–
Private / Academic efforts
NASA
SETI Institute
SETI@Home
• SETI@Home : Project led by researchers at
University of California - Berkeley (1997)
• “Piggyback SETI” receiver at Arecibo radio
telescope
SETI: The Task
• What is the complexity of detecting
signals sent by an extra-terrestrial
civilization?
• Category: massively difficult
– Signal parameters unknown
– Sensitivity of analysis depends on
available computing power
SETI: Task Assumptions
• Aliens would broadcast a signal that is
easily detectable, distinguishable from
natural radio emission
• Narrowband signals stand out from
natural broadband sources of noise
• Thus, SETI efforts concentrate on
narrowband signals
• The hydrogen line: 1420 MHz
Narrowband Signals
• Use a narrow search window (channel)
around a given frequency
• Earlier systems:
– Analog narrow bandpass filters
• Newer systems:
– Dedicated banks of Fast-Fourier Transform (FFT)
processors
– Separate signal into up to 1 billion 1-Hz channels
Signal Problems
• Signals are unlikely to be stable in
frequency
– Example:
• A listener on Earth’s surface for 1.4GHz signals
undergoes acceleration of up to 3.4cm/s2 due
to Earth’s rotation
• Corresponding Doppler drift rate: 0.16 Hz/s
• Alien transmission would drift out of channel in
about 6 seconds
Signal Problems
• We can compensate for Earth’s rotation, but
what about remote planet?
• Solution:
– Correct for Doppler drift at the receiving end
– Search for signals at multiple Doppler drift rates
• Computation-intensive!
• Allowed remote drift rates are between
-10Hz/s and +10Hz/s (+50/-50)
Other Parameters
• Signal frequency / bandwidth?
• Is it pulsed? If so, what period?
• Solving over the full range of
parameters is beyond even the world’s
most powerful supercomputers
• Fortunately, the task is easily
partitioned
Distributing the Load
• Break the data up into separate
frequency bands
• Observations of different portions of the
sky are essentially independent
• Partition the huge dataset into smaller
chunks that ordinary PC’s can handle
Data Collection
• Observations come from 305-meter radio
telescope in Arecibo, Puerto Rico
• Dedicated instrumentation within telescope
• Passively monitors the telescope’s field of
view (0.1 degrees)
• Stationary telescope: objects pass through in
24 seconds
• When telescope is tracking: 12 s
Data Collection
• Over the course of the project,
SETI@Home will see visible portions of
the sky 3 or more times
• Covers stars with declinations from -2
to 38 degrees
• Approximately 25% of the sky
Data Collection
• System records a 2.5MHz band,
centered at the 1,420MHz hydrogen line
• Records 2-bit samples onto 35GB DLT
tapes (Recall: Nyquist Rate)
• Each tape: 15.5h of data
• 39TB of data total
Data Collection
• Data tapes shipped to Berkeley
• Split into work units using 4 splitter
workstations
– Divide 2.5MHz data into 256 subbands using
2048-point FFT followed by 256 8-point inverse
transforms
– Subbands are 9,766Hz wide
– 220 samples, thus each work unit is ~10KHz wide
and 107 s long
– Work units overlap to detect overlapping signals
• Work units are stored on separate server for
distribution
Data Collection
• Main SETI@Home Server
– 3 Sun Enterprise 450 Series Computers
• User Database
– Contains account information for each of the 2.4 million
users
– Also aggregates statistics by platform
• Science Database
– Contains information about each work unit
» Time, sky coords, frequency range
» How many times each work unit has been downloaded
– Stores parameters of candidate signals
» Signal power, frequency, arrival time sky coords
» 1.1 billion candidates (Oct. 2000)
• Work unit storage
Data Collection
• Work unit storage server
– Distribution of work units, storage of results
• Client communications via HTTP
– Important to get through firewalls
– Request to download new work unit
• Work units that have not been downloaded yet have
priority
• Then, work units for which no results have been
returned
– Request to post results
• Data contains signal characteristics
• Updates user statistics
The SETI@Home Client
• Available for 47 different combinations
of CPU and OS
• Dominant platforms: Windows, Mac
– Feature graphical “screensaver” display
– UNIX works as daemon
(display program available for X)
The SETI@Home Client
• Downloads work unit from server
• Performs “baseline smoothing” to
eliminate wideband features, help
reduce false signals
• Performs main data analysis loop
(shown on next page)
Main Data Analysis Loop
for Doppler Drift rates from -50 to 50Hz {
for bandwidths from 0.075 to 1220Hz in 2x steps {
Generate time-ordered power spectra
Search for short-duration signals
above a constant threshold
for each frequency {
Search for faint signals matching
beam parameters (Gaussians)
Search for groups of 3 evenly
spaced signals
Search for faint repeating pulses (pulses)
} } }
The SETI@Home Client
• Client examines signal at various drift
rates
– 10 to -10 Hz (fine-grained)
– 50 to -50 Hz (~twice as course)
• Although drift rates are most likely
negative, examine both sides
– For statistical comparison
– To detect deliberately chirped signals
The SETI@Home Client
• For each drift rate, examines the signal
at different bandwidths between 0.075
and 1,221 Hz
– Using a variety of FFT
– Not all bandwidths are examined at every
drift rate (only when drift rate becomes
significant compared to the frequency)
The SETI@Home Client
• Transformed signals are examined for
spiked exceeding 22 times the mean
noise power
• Threshold: 7.2 x 1025 W/m2 (at the
finest frequency resolutions)
• “Detecting a cell phone on one of the
moons of Saturn”
• These spikes are what the client reports
The SETI@Home Client
• Other transformations to detect
Gaussians and pulse patterns
• Specialized algorithms (fast-folding
algorithms) for detecting pulses
efficiently
• Work by “folding” portions of the signal
together in time, to detect gain over the
pulse period
The SETI@Home Client
• Typical workload:
– 2.4 to 3.8 trillion floating-point operations
(teraflops)
– Typical 500MHz PC takes 10 to 12 hours to
complete a work unit
– Within the average work unit:
• 4 spikes, 1 Gaussian, 1 pulsed signal, 1 triplet
signal
• <Insert Demonstration Here>
Postprocessing
• Client uploads candidate signal data to
server
(exact data formats are kept quiet)
• Server examines results for errors
• Keeps track of user statistics
Error detection
• SETI@Home uses thousands of CPU years
every day
• With heat, floating-point units are the first to
give incorrect results
• High error rates are offset by easy error
detection
• Replication of work units is the primary error
detection mechanism
• 60% of work unit results must agree in order
to be considered for further analysis
Candidate Signals
• Vast majority of detected signals
correspond to terrestrial RFI
– Extra-terrestrial signals can not last more
than 12 s
– Also, signals should repeat when viewing
the same portion of the sky at a later time
Project Status
• October 2000
– 2.4 million users
– 520,000 active clients donating 437,000
years of CPU time (4.3 x 1020 flop)
– Average processing rate: 15.7 Tflops
• “Largest supercomputer in existence”
• “Largest computation ever performed”
Project Status
• 1.1 Billion signals in SETI@Home
database
• Candidate signals being submitted
faster than the server can confirm them
• So far, no extra-terrestrial signals
Future Work
• Expand coverage by adding new
telescope in southern hemisphere
• Expand frequency bandwidth
(up to double the data rate)
• Expand number of volunteers, increase
SETI education efforts
Summary
•
•
•
•
Seemingly impossible problem
Easily partitioned
Good publicity, marketing
Achieves incredible performance
– But, high latency
– High redundancy/replication of
computation
Related Work
• Distributed.net
– Cracking of encryption keys (DES,rc5)
– Search for optimal Golomb rulers
• Folding@Home
– Stanford project - distributed protein folding
• PiHex
– Distributed effort to calculate Pi
• GIMPS
– Great Internet Mersenne Prime Search
Discussion
• Potential comments on:
– System architecture
– Fault-tolerance
– Security
References
• Seti@Home Web Site
– http://setiathome.ssl.berkeley.edu/
• NASA Science Newsletter
– http://science.nasa.gov/newhome/headlines/ast23may99_1.htm
• Papers
– Korpela, et al. “SETI@Home: Massively Distributed
Computing for SETI.”
– Sullivan, et al. “A new major SETI project based
on Project Serendip data and 100,000 personal
computers.”
– “The SETI@Home Sky Survey.” Available from the
SETI@Home web site.
Download