Store Devices Microsoft Surface PCs & tablets Xbox Virtual reality Accessories Windows phone Microsoft Band Software Office Windows Additional software Apps All apps Windows apps Windows phone apps Games Xbox One games Xbox 360 games PC games Windows games Windows phone games Entertainment All Entertainment Movies & TV Music Business & Education Business Store Education Store Developer Sale Back-to-school essentials Sale Products Software & services Windows Office Free downloads & security Internet Explorer Microsoft Edge Skype OneNote OneDrive Microsoft Health MSN Bing Microsoft Groove Microsoft Movies & TV Devices & Xbox All Microsoft devices Microsoft Surface All Windows PCs & tablets PC accessories Xbox & games Microsoft Band Microsoft Lumia All Windows phones Microsoft HoloLens For business Cloud Platform Microsoft Azure Microsoft Dynamics Windows for business Office for business Skype for business Surface for business Enterprise solutions Small business solutions Find a solutions provider Volume Licensing For developers & IT pros Develop Windows apps Microsoft Azure MSDN TechNet Visual Studio For students & educators Office for students OneNote in classroom Shop PCs & tablets perfect for students Microsoft in Education Support Sign in Research Research o Research Home o Research areas Algorithms Artificial intelligence and machine learning Computer systems and networking Computer vision Data visualization, analytics, and platform Ecology and environment Economics Graphics and multimedia Hardware, devices, and quantum computing Human-centered computing Mathematics o o o o o Medical, health, and genomics Natural language processing and speech Programming languages and software engineering Search and information retrieval Security, privacy, and cryptography Social Sciences Technology for emerging markets Products & Downloads Programs & Events Academic Programs Events & Conferences People Careers About About Microsoft Research blog Asia Lab Cambridge Lab India Lab New England Lab New York City Lab Redmond Lab Applied Sciences Lab Research areas o Algorithms o Artificial intelligence and machine learning o Computer systems and networking o Computer vision o Data visualization, analytics, and platform o Ecology and environment o Economics o Graphics and multimedia o Hardware, devices, and quantum computing o Human-centered computing o Mathematics o Medical, health, and genomics o Natural language processing and speech o Programming languages and software engineering o Search and information retrieval o Security, privacy, and cryptography o Social Sciences o Technology for emerging markets Products & Downloads Programs & Events o Academic Programs o Events & Conferences People Careers About o About o Microsoft Research blog o Asia Lab o Cambridge Lab o India Lab o New England Lab o New York City Lab o Redmond Lab o Applied Sciences Lab Petabyte Scale Data Mining: Dream or Reality? August 1, 2002 Download Document BibTex Authors Alexander S. Szalay Jim Gray Jan Vandenberg Published In SPIE Astronomy Telescopes and Instruments Publication Type Inproceedings Book Title SPIE Astronomy Telescopes and Instruments Pages 7 Number MSR-TR-2002-84 Abstract Related Info Abstract Science is becoming very data intensive 1 . Today’s astronomy datasets with tens of millions of galaxies already present substantial challenges for data mining. In less than 10 years the catalogs are expected to grow to billions of objects, and image archives will reach Petabytes. Imagine having a 100GB database in 1996, when disk scanning speeds were 30MB/s, and database tools were immature. Such a task today is trivial, almost manageable with a laptop. We think that the issue of a PB database will be very similar in six years. In this paper we scale our current experiments in data archiving and analysis on the Sloan Digital Sky Survey 2,3 data six years into the future. We analyze these projections and look at the requirements of performing data mining on such data sets. We conclude that the task scales rather well: we could do the job today, although it would be expensive. There do not seem to be any show-stoppers that would prevent us from storing and using a Petabyte dataset six years from today. Related Info Related Files tr-2002-84.pdf Follow Microsoft Research Follow @MSFTResearch Share this page Tweet Learn Windows Office Skype Outlook OneDrive MSN Devices Microsoft Surface Xbox PC and laptops Microsoft Lumia Microsoft Band Microsoft HoloLens Microsoft Store View account Order tracking Retail store locations Returns Sales & support Downloads Download Center Windows downloads Windows 10 Apps Office Apps Microsoft Lumia Apps Internet Explorer Values Diversity and inclusion Accessibility Environment Microsoft Philanthropies Corporate Social Responsibility Privacy at Microsoft Company Careers About Microsoft Company news Investors Research Site map English (United States) Contact us Privacy & cookies Terms of use Trademarks About our ads © 2016 Microsoft ​