G. Green Foundations of Database Systems Class Introduction 1 • • • • • Introductions Course Overview Syllabus Case Database Development Overview G. Green Agenda 2 Foundations of Database Systems Understand data-related activities of SDLC Implement data modeling, database design, and database implementation techniques CASE (Visio) Database (SQL Server) G. Green Objectives Course Contents Lectures, Examples, In-Class Exercises Individual Assignments (3) Team Project* (3 parts) Quizzes (3) Exams (2) 3 *Can request teammates; see syllabus for Team Preferences deadline • Service Learning & Kolb’s Learning Cycle • Motivators for Choosing MIS Major • International and US • Periodic Assessments G. Green Research • Some NOT graded; others are 4 Learning Prepare: Participate: G. Green › Prepare --read & reread book, notes-- for each class › Do book, in-class, and extra problems on your own › Come see me during office hours for help › Attend, listen, be attentive (no internet distractions), be engaged › Ask and answer questions, & add to discussion › Do each assignment completely & in a timely and professional manner Take PLENTY of notes in class: › Do NOT just rely on powerpoint Explore: › Go beyond classroom material to research topics 5 Class Resources http://canvas.baylor.edu Schedule contains links to all lecture slides, study guides, assignments and project write-ups G. Green Syllabus/Schedule, Grades, Attendance: Other Resources: http://blogs.baylor.edu/gina_green/mis-4340-resources/ Copies of in-class handouts, team resources, database tables, … 6 G. Green Syllabus… 7 G. Green Introduction to Databases Chapter 1 8 Topics • Chapter 1 • Chapter 9 (Pages 409 – 410) • Big Data G. Green • The Database Environment • Database Development Process • Chapter 10 (Pages 444 – 445, 446-447) • Master Data Management • Data Federation • Chapter 11 (Pages 464 – 472, 486, 499 – 506) • • • • Database Personnel Metadata Management (e.g., Data Dictionaries) Backup Facilities Overview of Tuning the Database for Performance 9 Evolution of Database Technologies 1970’s 1980’s 1990’s 2000+ Federated G. Green 1960’s MDDB Hierarchical Object XML Traditional Files Relational Network NoSQL Object-Relational ……. 10 Figure 1-3 Old file processing systems: Example Duplicate Data 11 Traditional File Processing Environment › Program-data dependence = “structural” & “data” › Limited data sharing = “islands of automation” › Duplication of data = “redundancy” › Lengthy development times › Excessive program maintenance G. Green Disadvantages: 12 G. Green The Database Environment 13 Program-data independence Improved data sharing Minimal data redundancy Improved data accessibility/responsiveness Improved data consistency Faster application development Enforcement of standards Improved data quality Reduced program maintenance G. Green Advantages of Databases 14 G. Green Database Development Process Chapter 1 15 Systems Development Life Cycle DB Activities in SDLC Planning Enterprise Modeling* Analysis DB Scope, Requirements (Conceptual Data Model) Design DB Design (Logical DB Design) DB Design (Physical DB Design) Implementation DB Implementation (Load, Test, Eval, Op) DB Maintenance* G. Green SDLC for this class 16 Enterprise Data Modeling requirements G. Green • Determine organizational data • Build enterprise data model • outcome is a very high-level Entity-Relationship Diagram • see : • http://da.ks.gov/kito/ITPlans/data_maps06.ppt • http://www.tdan.com/view-articles/5205 17 G. Green 18 Source: http://www.tdan.com/view-articles/5205 Conceptual Data Modeling Determine business rules G. Green Determine user data requirements Build conceptual data model › outcome is an Entity-Relationship Diagram (conceptual schema) 19 Logical Database Design › e.g., the Relational Model G. Green Select database model Transform conceptual (ERD) into logical (relational) data model Normalize and link data structures › Outcome is normalized, linked relational tables 20 Physical Database Design Select storage device(s) Design fields, records, files (physical schema) G. Green Select database product (e.g., SQL Server) › outcomes are detailed, physical definitions for: fields (data dictionary) records (space requirements for physical structures)* files (access methods) *Will not do in this class 21 Database Implementation • Create views (external schema) • Establish access rights G. Green • Create database file/table structures • Load test data • Write/test programs that process data • Install database (with production data) into production operations › outcomes are secured database tables loaded with data 22 Database Maintenance • Maintain database structures • Performance, tuning G. Green • Storage/space management • I/O Contention • CPU Usage • Application Tuning • Data availability • DBMS upgrades, "fixes" • Backup, recovery ……. 23 Database Maintenance, cont… • Full • Incremental • Differential G. Green • Backup • Business Continuity • Data Replication ("fallback") 24 G. Green Data and Database Administration Chapter 11 25 Data Administration: A high-level function that is responsible for the overall management of data resources in an organization, including maintaining corporate-wide definitions and standards Database Administration: A technical function that is responsible for physical database design and for dealing with technical issues such as security enforcement, database performance, and backup and recovery G. Green Traditional Administration Definitions 26 Data People Involved in SDLC Data(base) Analysts/Designers requirements elicitation, design Business (Intelligence) Analyst BI requirements, design Data Architects strategy, governance Data Stewards quality, metadata, MDM Business Analytics Engineer data analytics, statistics, mining Data Mining Engineer; Big Data “big data” specialists G. Green Data Administrators Engineer; Data Scientist … Database Administrators (System) DBAs implementation/maintenance Application DBAs Procedural DBAs stored code e-DBAs web-enabled DBMSs Data Warehouse Administrators ETL, DW implementation 27 • • • • • • • • • Relational database design, implementation Database programming ETL (extract, translate, load) Data warehousing design (star schema) and implementation (MDDB) Data analysis, reporting, and mining techniques Statistical modeling with tools such as R, SAS, or SPSS Data visualization tools Cloud database implementations Technologies for structured and unstructured data • Hadoop (Hadoop is an Apache project to provide an open-source implementation of frameworks for reliable, scalable, distributed computing and data storage.) • NoSQL • "NewSQL" ***See Big Data University for (mostly) free self-study training G. Green Growing Skillset 28 G. Green Data Quality and Integration Chapter 10 29 Metadata Management • Part of DBMS • "Active" dictionary G. Green • System Catalog • Data Dictionary • Typically "passive" • Extension of catalog metadata • Information Repository (e.g., IRDS) • Standards for data dictionaries • Integrates dictionaries 30 • "Ensuring the currency, meaning, and quality of reference data within and across various subject areas" (pg 444) • Identify G. Green Master Data Management • Common Data Subjects • Common Data Elements • Sources of "the truth" • Cleanse • Update applications to reference Master Data repository • Ensures consistency of key data (not ALL data) throughout organization 31 G. Green Data and Database Administration Chapter 11 32 Cloud Computing • Business Model Computing resources on demand Need-based architectures Internet-based delivery Pay as you go G. Green • • • • • History (VERY high-level and approximate) Time-sharing Utility Computing Virtual Machines 50's 60's WWW 70's Cloud Computing Personal Computers 80's Grid Computing 90's 2000's 33 G. Green Cloud Computing Services • Impacts to Data(base) Administration • See textbook page 469 34 Summary • Evolution of Data Management • Database Concepts • Components of a DBMS Environment • Database Advantages G. Green • Disadvantages of file processing • People Involved in Data Management • Traditional job divisions and responsibilities • Newer job titles • Database Development: • Overall SDLC • Database Activities in the SDLC • Special Topics • Metadata Management • MDM • Cloud Computing Impacts 35