Creating the Physical Model Designing the Physical Model Phase IV: Defining the physical model Database Object Naming Conventions • Keep the logical and physical names similar and descriptive. • Capitalize table and attribute names. • Use underscores instead of spaces to delineate separate words in an object’s name. • Use a suffix of _PK to indicate primary keys. • Use a suffix of _ID to indicate production keys. • Find a good balance between using very specific and very vague names. Database Object Naming Conventions • Develop a reasonable list of abbreviations. • List all the objects’ names, and work with the user community to define them. • Resolve name disputes. • Document your naming standards in the metadata document. • Plan for the naming standards to be a living document. Translating the Dimensional Model into a Physical Model • Apply the naming standards to the tables and attributes of the dimensional model. • List table columns with primary keys listed first. • Label primary keys consistently. • Identify the format and length of columns. • Label unique keys with a (#). • Label column optionality with NULL (o) or NOT NULL (*) constraints. • Label foreign keys with _FK. • Use synonyms for user tables. Physical Model Product # *Product _PK # *Channel_PK # *Promotion_PK n n n * * * * * * * * * * * * * * * v(11) v(125) v(35) v(20) v(25) v(20) v(10) n v(20) v(20) v(10) v(10) v(10) d d PRODUCT_ID PRODUCT_DESC PRODUCT_NAME CATEGORY_ID CATEGORY_DESC SUPPLIER_ID PRODUCT_STATUS LIST_PRICE CATALOG_ID PRODCUT_TYPE PRODUCT_CODE PROMOTION_CODE WHSE_LOCATION VALID_FROM_DATE VALID_TO_DATE Defining the Hardware Transforming the base dimensional data model into the physical model includes some of the following: • Defining naming and database standards • Performing an initial sizing • Designing tablespaces • Defining an initial indexing strategy • Using partitioning to split table and index data into smaller, more manageable chunks • Determining where to place database objects on disk (RAID, striping, disk mapping) • Using parallel processing Architectural Requirements Scalability Manageability Availability Extensibility Flexibility Integrated Accessibility Reliability User Business Budget Technology Architecture Characteristics • Robust • Available • VLM (very large memory) • Reliable • 64-bit • Extensible • Connective • Scalable • Open • Supportable • Recoverable • Parallel Hardware Requirements • SMP (Symmetric multiprocessing) • Cluster and MPP (massively parallel processing) • Hybrids using SMP and MPP Evaluation Criteria Determine the platform for your needs: SMP Clusters MPP High Low Scalability High Maturity Low Parallel Processing • Parallel daily operations • Shared resources – Memory – Disk – Nothing • Loosely or tightly coupled Operating system Hardware Application Database Making the Right Choice • Requirements differ from operational systems • Benchmark – Available from vendors – Develop your own – Use realistic queries • Scalability important Symmetric Multiprocessing (SMP) • Communication by shared memory • Disk controllers accessible to all CPUs • Proven technology CPU CPU CPU CPU Common bus Shared memory Shared disks SMP CPU CPU CPU CPU Benefits: • High concurrency • Workload balancing Shared memory • Moderate scalability • Easy administration Limitations: • Memory (cluster for improvements) • Bandwidth Clusters Node 1 Node 2 Node 3 CPU CPU CPU CPU CPU CPU CPU CPU CPU Shared memory Shared memory Common high-speed bus Shared memory Common high-speed bus Shared disks Clusters • • • • • Shared disk, loosely coupled Dedicated memory High-speed bus Shared resources SMP node Massively Parallel Processing (MPP) CPU CPU CPU CPU Memory Memory Memory Memory Disk Disk Disk Disk MPP nCube Arrangements • • • • • • • A shared nothing architecture Many nodes Fast access Exclusive memory on a node Low cost per node Scalable nCUBE configuration MPP Benefits • • • • • Unlimited incremental growth Very scalable Fast access Low cost per node Good for DSS MPP Limitations • • • • • • Rigid partitioning Cache consistency Restricted disk access High memory cost per node High management burden Careful data placement Architectural Tiers Tiered structures: • Modular • Logical separation DB server Distributed structures: • Two-tier • Three-tier • Four-tier (and more) Apps server Workstations Web server Internet Sample System Architecture Middleware Technologies for integration Gateway Database Server Requirements • • • • • • • • Robust Available Reliable Extensible Scalable Supportable Recoverable Parallel Parallelism • • • • • • • Database Query Load Index Sort Backup Recovery Further Considerations • • • • • • • Optimization strategy Partitioning strategy Summarization strategy Indexing techniques Hardware and software scalability Availability Administration Parallel Processing A large task broken into smaller tasks: • Concurrent execution • One or more processors Elapsed time Not parallel Processor 1 Parallel Processor 1 Processor 2 Processor 3 Processor 4 Parallel Database • Increased speed • Improved scalability • Performance gains – Availability – Flexibility – More users Parallel Processor 1 Processor 2 Processor 3 Processor 4 Parallel Query SQL code split among server processes Subquery Subquery Query Subquery Parallel Load Bypass SQL processing to speed throughput Jan 98 Feb 98 Order table Mar 98 Parallel Processing • Index: reduces the time to create • Sort: allocates memory in cache efficiently Parallel Processing • Backup: runs simultaneously from any node (online and offline) • Recovery: runs simultaneously from redo logs • Summaries: uses the CREATE TABLE AS SELECT statement