Conceptual Architecture of PostgreSQL Overview • • • • • • • • • • • • • • • What is Postgres? Research Methods Considered Alternatives Reference Architecture Conceptual Architecture Inside Subsystems – Query Processor Inside Subsystems – Storage Manager Inside Subsystems – Utilities Use Case Concurrency Control Design Trade-offs Limitations of Research Lessons Learned Summary Q&A What is PostgreSQL? • • • • • • Open-Source database management system ‘Ingres Project’ at UC Berkeley First Postgres version released in 1997 Cross-Platform Written in C Used by organisations such as:– Yahoo – MySpace – Skype Research Methods General understanding of PostgreSQL – – – – Developers guide PostgreSQL wiki page PostgreSQL manual Wikipedia Reference architecture for Database Management System - Backbone of conceptual architecture Conceptual architecture for PostgreSQL - Various available online documentation of Conceptual Architectures of PostgreSQL Considered Alternatives 1. Client – Server 2. Client – Server w/ Pipe & Filter 3. Client – Server w/ Pipeline & Repository Reference Architecture Figure. 1 Conceptual Architecture Client Communications Manager Utilities & Shared Components Legend Server (Query Processor) Dependencies Storage Manager Figure 2. Query Processor Figure 3. Inside Subsystems Query Processor • Consists of :– – – – – – Parser: Traffic Cop : Utility Command: Rewriter: Planner/Optimizer: Executor: syntax simple/complex simple queries rule augmentation optimal plan execute optimal plan • Models a Pipe & Filter style Architecture • Uses storage management & shared utilities Inside Subsystems Storage Manager Provides Shared memory for buffers & access to database. Suggests repository style Legend Figure 4. Inside Subsystems Utilities Legend Consists of : – – – – Utilities Catalog Access Methods Nodes/Lists Utilities are used by all sub-components of the query processor Figure 5. Use Case – Select Query Figure 6. Concurrency Control Postmaster spawns multiple server threads (process per request) Problem - overwriting or modifying data Solution… - MVCC – Multi-version concurrency control - Point in time DB snapshot - Locks – locks entire table from being altered/deleted Design Trade-offs Reliability vs Performance Scalability vs Maintainability Security vs Performance Limitations of Research Personal Knowledge as well as experience with architectures & databases Determining depth of research Sources are incomplete Lessons Learned Cannot rely on one source for information, will have to go through several sources to build a complete picture Hard to decide on an architecture style The value of the reference architecture Summary Hybrid Conceptual Architecture Client Server – front/back connection Pipe & Filter – back end processes Repository – storage management/access Design Attributes Reliable & Secure - data integrity, strict SQL compliance, user authentication Performance - slower and more complicated Thank You! Questions?