Store Devices Microsoft Surface PCs & tablets Xbox Virtual reality Accessories Windows phone Microsoft Band Software Office Windows Additional software Apps All apps Windows apps Windows phone apps Games Xbox One games Xbox 360 games PC games Windows games Windows phone games Entertainment All Entertainment Movies & TV Music Business & Education Business Store Education Store Developer Sale Back-to-school essentials Sale Products Software & services Windows Office Free downloads & security Internet Explorer Microsoft Edge Skype OneNote OneDrive Microsoft Health MSN Bing Microsoft Groove Microsoft Movies & TV Devices & Xbox All Microsoft devices Microsoft Surface All Windows PCs & tablets PC accessories Xbox & games Microsoft Band Microsoft Lumia All Windows phones Microsoft HoloLens For business Cloud Platform Microsoft Azure Microsoft Dynamics Windows for business Office for business Skype for business Surface for business Enterprise solutions Small business solutions Find a solutions provider Volume Licensing For developers & IT pros Develop Windows apps Microsoft Azure MSDN TechNet Visual Studio For students & educators Office for students OneNote in classroom Shop PCs & tablets perfect for students Microsoft in Education Support Sign in Research Research o Research Home o Research areas Algorithms Artificial intelligence and machine learning Computer systems and networking Computer vision Data visualization, analytics, and platform Ecology and environment Economics Graphics and multimedia Hardware, devices, and quantum computing Human-centered computing Mathematics o o o o o Medical, health, and genomics Natural language processing and speech Programming languages and software engineering Search and information retrieval Security, privacy, and cryptography Social Sciences Technology for emerging markets Products & Downloads Programs & Events Academic Programs Events & Conferences People Careers About About Microsoft Research blog Asia Lab Cambridge Lab India Lab New England Lab New York City Lab Redmond Lab Applied Sciences Lab Research areas o Algorithms o Artificial intelligence and machine learning o Computer systems and networking o Computer vision o Data visualization, analytics, and platform o Ecology and environment o Economics o Graphics and multimedia o Hardware, devices, and quantum computing o Human-centered computing o Mathematics o Medical, health, and genomics o Natural language processing and speech o Programming languages and software engineering o Search and information retrieval o Security, privacy, and cryptography o Social Sciences o Technology for emerging markets Products & Downloads Programs & Events o Academic Programs o Events & Conferences People Careers About o About o Microsoft Research blog o Asia Lab o Cambridge Lab o India Lab o New England Lab o New York City Lab o Redmond Lab o Applied Sciences Lab Quickly Generating Billion-Record Synthetic Databases January 1, 1994 Download PDF View Link BibTex Authors Jim Gray Ken Baclawski Prakash Sundaresan Susanne Englert Publication Type Inproceedings Publisher Association for Computing Machinery, Inc. Copyright © 1994 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1 (212) 869-0481, or permissions@acm.org. The definitive version of this paper can be found at ACM’s Digital Library –http://www.acm.org/dl/. Abstract Related Info Abstract Evaluating database system performance often requires generating synthetic databases – ones having certain statistical properties but filled with dummy information. When evaluating different database designs, it is often necessary to generate several databases and evaluate each design. As database sizes grow to terabytes, generation often takes longer than evaluation. This paper presents several database generation techniques. In particular it discusses: (1) Parallelism to get generation speedup and scaleup. (2) Congruential generators to get dense unique uniform distributions. (3) Special-case discrete logarithms to generate indices concurrent to the base table generation. (4) Modification of (2) to get exponential, normal, and self-similar distributions. The discussion is in terms of generating billion-record SQL databases using C programs running on a shared-nothing computer system consisting of a hundred processors, with a thousand discs. The ideas apply to smaller databases, but large databases present the more difficult problems. Related Info Related Files syntheticdatagen.doc Follow Microsoft Research Follow @MSFTResearch Share this page Tweet Learn Windows Office Skype Outlook OneDrive MSN Devices Microsoft Surface Xbox PC and laptops Microsoft Lumia Microsoft Band Microsoft HoloLens Microsoft Store View account Order tracking Retail store locations Returns Sales & support Downloads Download Center Windows downloads Windows 10 Apps Office Apps Microsoft Lumia Apps Internet Explorer Values Diversity and inclusion Accessibility Environment Microsoft Philanthropies Corporate Social Responsibility Privacy at Microsoft Company Careers About Microsoft Company news Investors Research Site map English (United States) Contact us Privacy & cookies Terms of use Trademarks About our ads © 2016 Microsoft ​