SETI@Home Sunny Gleason COM S 717 November 29, 2001 (Based on the article, “SETI@Home: Massively Distributed Computing for SETI.”) In This Presentation • • • • • What is SETI? Partitioning the Job The SETI@Home Client Server Post-processing Project Status SETI@Home • SETI: Search for Extra-Terrestrial Intelligence – – – – Private / Academic efforts NASA SETI Institute SETI@Home • SETI@Home : Project led by researchers at University of California - Berkeley (1997) • “Piggyback SETI” receiver at Arecibo radio telescope SETI: The Task • What is the complexity of detecting signals sent by an extra-terrestrial civilization? • Category: massively difficult – Signal parameters unknown – Sensitivity of analysis depends on available computing power SETI: Task Assumptions • Aliens would broadcast a signal that is easily detectable, distinguishable from natural radio emission • Narrowband signals stand out from natural broadband sources of noise • Thus, SETI efforts concentrate on narrowband signals • The hydrogen line: 1420 MHz Narrowband Signals • Use a narrow search window (channel) around a given frequency • Earlier systems: – Analog narrow bandpass filters • Newer systems: – Dedicated banks of Fast-Fourier Transform (FFT) processors – Separate signal into up to 1 billion 1-Hz channels Signal Problems • Signals are unlikely to be stable in frequency – Example: • A listener on Earth’s surface for 1.4GHz signals undergoes acceleration of up to 3.4cm/s2 due to Earth’s rotation • Corresponding Doppler drift rate: 0.16 Hz/s • Alien transmission would drift out of channel in about 6 seconds Signal Problems • We can compensate for Earth’s rotation, but what about remote planet? • Solution: – Correct for Doppler drift at the receiving end – Search for signals at multiple Doppler drift rates • Computation-intensive! • Allowed remote drift rates are between -10Hz/s and +10Hz/s (+50/-50) Other Parameters • Signal frequency / bandwidth? • Is it pulsed? If so, what period? • Solving over the full range of parameters is beyond even the world’s most powerful supercomputers • Fortunately, the task is easily partitioned Distributing the Load • Break the data up into separate frequency bands • Observations of different portions of the sky are essentially independent • Partition the huge dataset into smaller chunks that ordinary PC’s can handle Data Collection • Observations come from 305-meter radio telescope in Arecibo, Puerto Rico • Dedicated instrumentation within telescope • Passively monitors the telescope’s field of view (0.1 degrees) • Stationary telescope: objects pass through in 24 seconds • When telescope is tracking: 12 s Data Collection • Over the course of the project, SETI@Home will see visible portions of the sky 3 or more times • Covers stars with declinations from -2 to 38 degrees • Approximately 25% of the sky Data Collection • System records a 2.5MHz band, centered at the 1,420MHz hydrogen line • Records 2-bit samples onto 35GB DLT tapes (Recall: Nyquist Rate) • Each tape: 15.5h of data • 39TB of data total Data Collection • Data tapes shipped to Berkeley • Split into work units using 4 splitter workstations – Divide 2.5MHz data into 256 subbands using 2048-point FFT followed by 256 8-point inverse transforms – Subbands are 9,766Hz wide – 220 samples, thus each work unit is ~10KHz wide and 107 s long – Work units overlap to detect overlapping signals • Work units are stored on separate server for distribution Data Collection • Main SETI@Home Server – 3 Sun Enterprise 450 Series Computers • User Database – Contains account information for each of the 2.4 million users – Also aggregates statistics by platform • Science Database – Contains information about each work unit » Time, sky coords, frequency range » How many times each work unit has been downloaded – Stores parameters of candidate signals » Signal power, frequency, arrival time sky coords » 1.1 billion candidates (Oct. 2000) • Work unit storage Data Collection • Work unit storage server – Distribution of work units, storage of results • Client communications via HTTP – Important to get through firewalls – Request to download new work unit • Work units that have not been downloaded yet have priority • Then, work units for which no results have been returned – Request to post results • Data contains signal characteristics • Updates user statistics The SETI@Home Client • Available for 47 different combinations of CPU and OS • Dominant platforms: Windows, Mac – Feature graphical “screensaver” display – UNIX works as daemon (display program available for X) The SETI@Home Client • Downloads work unit from server • Performs “baseline smoothing” to eliminate wideband features, help reduce false signals • Performs main data analysis loop (shown on next page) Main Data Analysis Loop for Doppler Drift rates from -50 to 50Hz { for bandwidths from 0.075 to 1220Hz in 2x steps { Generate time-ordered power spectra Search for short-duration signals above a constant threshold for each frequency { Search for faint signals matching beam parameters (Gaussians) Search for groups of 3 evenly spaced signals Search for faint repeating pulses (pulses) } } } The SETI@Home Client • Client examines signal at various drift rates – 10 to -10 Hz (fine-grained) – 50 to -50 Hz (~twice as course) • Although drift rates are most likely negative, examine both sides – For statistical comparison – To detect deliberately chirped signals The SETI@Home Client • For each drift rate, examines the signal at different bandwidths between 0.075 and 1,221 Hz – Using a variety of FFT – Not all bandwidths are examined at every drift rate (only when drift rate becomes significant compared to the frequency) The SETI@Home Client • Transformed signals are examined for spiked exceeding 22 times the mean noise power • Threshold: 7.2 x 1025 W/m2 (at the finest frequency resolutions) • “Detecting a cell phone on one of the moons of Saturn” • These spikes are what the client reports The SETI@Home Client • Other transformations to detect Gaussians and pulse patterns • Specialized algorithms (fast-folding algorithms) for detecting pulses efficiently • Work by “folding” portions of the signal together in time, to detect gain over the pulse period The SETI@Home Client • Typical workload: – 2.4 to 3.8 trillion floating-point operations (teraflops) – Typical 500MHz PC takes 10 to 12 hours to complete a work unit – Within the average work unit: • 4 spikes, 1 Gaussian, 1 pulsed signal, 1 triplet signal • <Insert Demonstration Here> Postprocessing • Client uploads candidate signal data to server (exact data formats are kept quiet) • Server examines results for errors • Keeps track of user statistics Error detection • SETI@Home uses thousands of CPU years every day • With heat, floating-point units are the first to give incorrect results • High error rates are offset by easy error detection • Replication of work units is the primary error detection mechanism • 60% of work unit results must agree in order to be considered for further analysis Candidate Signals • Vast majority of detected signals correspond to terrestrial RFI – Extra-terrestrial signals can not last more than 12 s – Also, signals should repeat when viewing the same portion of the sky at a later time Project Status • October 2000 – 2.4 million users – 520,000 active clients donating 437,000 years of CPU time (4.3 x 1020 flop) – Average processing rate: 15.7 Tflops • “Largest supercomputer in existence” • “Largest computation ever performed” Project Status • 1.1 Billion signals in SETI@Home database • Candidate signals being submitted faster than the server can confirm them • So far, no extra-terrestrial signals Future Work • Expand coverage by adding new telescope in southern hemisphere • Expand frequency bandwidth (up to double the data rate) • Expand number of volunteers, increase SETI education efforts Summary • • • • Seemingly impossible problem Easily partitioned Good publicity, marketing Achieves incredible performance – But, high latency – High redundancy/replication of computation Related Work • Distributed.net – Cracking of encryption keys (DES,rc5) – Search for optimal Golomb rulers • Folding@Home – Stanford project - distributed protein folding • PiHex – Distributed effort to calculate Pi • GIMPS – Great Internet Mersenne Prime Search Discussion • Potential comments on: – System architecture – Fault-tolerance – Security References • Seti@Home Web Site – http://setiathome.ssl.berkeley.edu/ • NASA Science Newsletter – http://science.nasa.gov/newhome/headlines/ast23may99_1.htm • Papers – Korpela, et al. “SETI@Home: Massively Distributed Computing for SETI.” – Sullivan, et al. “A new major SETI project based on Project Serendip data and 100,000 personal computers.” – “The SETI@Home Sky Survey.” Available from the SETI@Home web site.