Datahåndtering og analyse INF230 Srivatsav Saravanan I am currently pursuing a Master’s degree in Data Science, further. My goal is to transition into a data-driven role, where I can leverage my technical expertise to extract valuable insights and support data-driven decision-making. I hold a Bachelor’s degree in Computer Science and Engineering, specializing in Database Management Systems and Python Programming. Sheikh Hasan Elahi I am currently doing my Master’s in Data Science at NMBU. I completed my Bachelor’s in Computer Science with a specialization in data science, where I developed a strong foundation in programming, data analysis, and machine learning. My main interests are in predictive modeling, applied statistics, and exploring how AI can be used in practical settings. As your teaching assistant, I look forward to working with you, answering your questions, and supporting you throughout the course. Habib Ullah Associate Professor, from 2020 NMBU, Norway PostDoc Researcher, 2020 - 2020 The Arctic University of Norway Assistant Professor, 2016-2020 University of Ha’il, Saudi Arabia PhD in ICT, 2011 - 2015 University of Trento, Italy MSc, 2007 - 2009 Hanyang University, South Korea INF230 - Course • Class Lectures – Main dependency is on slides – Material for individual classes will be provided. INF230 - Course • Class (Theory) https://www.w3schools.com/mysql/default.asp • Lab Exercises (Practical) – MySQL Workbench – Assignment submission is compulsory Final Exam Grading Policy 3 hours written exam. AF. Compulsory Lab Assignments (Average 60% score at least) Students’ expectations Complete Course Plan - Schedule Complete Lab Plan - Schedule Consistency between Course and Lab Plans Interactive and Supportive Learning Environment Regular, Clear, Fast Communication Assessment Fairness Opportunities for Collaboration Final Exam - Samples Students’ expectations Average 60% each or all assignments? Final exam on SQL software ? How many questions in final exam? Do we have to write in final exam ? What does compulsory assignment mean ? ChatGPT or other tools are allowed ? What are my expectations My expectations Please perform course evaluation at the end of the course. Outline What is a database ? Integrity, Implementation, Durability Database-Management System (DBMS) Data Abstraction/Data Models Data Manipulation Language (DML)/Data Definition Language (DDL) Relational Databases Databases Database Organized collection of inter-related data that models some aspect of the real-world. Databases are the core component of most computer applications Source: Database Systems, CMU 15-445/645, https://15445.courses.cs.cmu.edu/fall2019/ Databases Database Organized Collection: A database is not just a random pile of data; it is carefully organized and structured. Data is arranged in tables or other data structures to make it easily accessible and manageable. Inter-Related Data: In a database, different pieces of data are linked or related to each other in a meaningful way. This helps in efficient retrieval and analysis of information. Models Some Aspect of the Real World: The purpose of a database is to represent and emulate some part of the real world. For example, a database for a library models the real-world library with information about books, authors, borrowers, due dates, etc. Databases Database: Example Consider an online bookstore. The bookstore's database models some aspects of the real-world bookstore operations. Organized Collection: The database contains structured tables for books, authors, customers, orders, and more. Each table stores specific information about these aspects. Inter-Related Data: Data in the database is linked together. For instance, a book record may be related to an author record through a unique author ID. An order record may be related to a customer record through a customer ID. Models Some Aspect of the Real World: The database represents the bookstore's operations in the real world. It tracks books, their availability, customer orders, Databases A simple database Databases A complicated database Databases The Internet revolution of the late 1990s sharply increased direct user access to databases. When you access an online bookstore and browse a book or music collection, you are accessing data stored in a database. When you enter an order online, your order is stored in a database. When you access a bank Web site and retrieve your bank balance and transaction information, the information is retrieved from the bank’s database system. Source: Database Systems Concepts, Avi Silberschatz, Henry F. Korth, S. Sudarshan Example Database example Create a database that models a digital music store to keep track of artists and albums. Things we need store: → Information about Artists → What Albums those Artists released Source: Database Systems, CMU 15-445/645, https://15445.courses.cs.cmu.edu/fall2019/ Example Database example Store our database as comma-separated value (CSV) files that we manage in our own code. → Use a separate file per entity. → The application has to parse the files each time they want to read/update records. Source: Database Systems, CMU 15-445/645, https://15445.courses.cs.cmu.edu/fall2019/ Example Source: Database Systems, CMU 15-445/645, https://15445.courses.cs.cmu.edu/fall2019/ Example Source: Database Systems, CMU 15-445/645, https://15445.courses.cs.cmu.edu/fall2019/ Example – Data Integrity How do we ensure that the artist is the same for each album entry? What if somebody overwrites the album year with an invalid string? How do we store that there are multiple artists on an album? Source: Database Systems, CMU 15-445/645, https://15445.courses.cs.cmu.edu/fall2019/ Example – Implementation How do you find a particular record? What if we now want to create a new application that uses the same database? What if two threads try to write to the same file at the same time? Source: Database Systems, CMU 15-445/645, https://15445.courses.cs.cmu.edu/fall2019/ Example – Durability What if the machine crashes while our program is updating a record? What if we want to replicate the database on multiple machines for high availability? Source: Database Systems, CMU 15-445/645, https://15445.courses.cs.cmu.edu/fall2019/ The Answer is: Database Management System (DBMS).. Database-Management System ❑ A DBMS is software that allows applications to store and analyze information in a database. ❑ A general-purpose DBMS is designed to allow the definition, creation, querying, update, and administration of databases that is both convenient and efficient. Source: Database Systems Concepts, Avi Silberschatz, Henry F. Korth, S. Sudarshan Database-Management System https://www.smartsheet.com/database-management Database-Management System https://www.proprofs.com/quiz-school/story.php?title=mis-chapter-3 Database-Management System Thus, although user interfaces hide details of access to a database, and most people are not even aware they are dealing with a database, accessing databases forms an essential part of almost everyone’s life today. DBMS Vendor MySQL Freeware SQL Server Microsoft Oracle Lite Oracle IMS DB IBM Source: Database Systems Concepts, Avi Silberschatz, Henry F. Korth, S. Sudarshan Database-Management System Data redundancy. The same information may be duplicated in several places (files). For example, if a student has a double major (say, music and mathematics) the address and telephone number of that student may appear in a file that consists of student records of students in the Music department and in a file that consists of student records of students in the Mathematics department. This redundancy leads to higher storage and access cost. Source: Database Systems Concepts, Avi Silberschatz, Henry F. Korth, S. Sudarshan Database-Management System Data inconsistency. In addition, it may lead to data inconsistency; that is, the various copies of the same data may no longer agree. For example, a changed student address may be reflected in the Music department records but not elsewhere in the system. Source: Database Systems Concepts, Avi Silberschatz, Henry F. Korth, S. Sudarshan Database-Management System Integrity problems. The data values stored in the database must satisfy certain types of consistency constraints. Suppose the university maintains an account for each department, and records the balance amount in each account. Suppose also that the university requires that the account balance of a department may never fall below zero. Developers enforce these constraints in the system by adding appropriate code in the various application programs. However, when new constraints are added, it is difficult to change the programs to enforce them. The problem is compounded when constraints involve several data items from different files. Source: Database Systems Concepts, Avi Silberschatz, Henry F. Korth, S. Sudarshan Database-Management System Data Abstraction Physical level. The lowest level of abstraction describes how the data are actually stored. The physical level describes complex low-level data structures in detail. Logical level. The next-higher level of abstraction describes what data are stored in the database, and what relationships exist among those data. The logical level thus describes the entire database in terms of a small number of relatively simple structures. View level. The highest level of abstraction describes only part of the entire database. Even though the logical level uses simpler structures, complexity remains because of the variety of information stored in a large database. Many users of the database system do not need all this information; instead, they need to access only a part of the database. The view level of abstraction exists to simplify their interaction with the system. The system may provide many views for the same database. Source: Database Systems Concepts, Avi Silberschatz, Henry F. Korth, S. Sudarshan Data Abstraction Source: Database Systems Concepts, Avi Silberschatz, Henry F. Korth, S. Sudarshan Data Abstraction Applicatoin Developers Database Designers Database Administrators (DBAs) https://binaryterms.com/view-of-data.html Instance and Schema Databases change over time as information is inserted and deleted. The collection of information stored in the database at a particular moment is called an instance of the database. The overall design of the database is called the database schema. Schemas are changed infrequently, if at all. The concept of database schemas and instances can be understood by analogy to a program written in a programming language. A database schema corresponds to the variable declarations (along with associated type definitions) in a program. Each variable has a particular value at a given instant. The values of the variables in a program at a point in time correspond to an instance of a database schema. Source: Database Systems Concepts, Avi Silberschatz, Henry F. Korth, S. Sudarshan Instances https://www.researchgate.net/figure/An-example-database-instance_fig1_220282752 Schemas https://afteracademy.com/blog/what-is-a-schema Data Models Underlying the structure of a database is the data model: a collection of conceptual tools for describing data, data relationships, data semantics, and consistency constraints. A data model provides a way to describe the design of a database at the physical, logical, and view levels. Data models in DBMS: Hierarchical Model. Network Model. Entity-Relationship Model. Relational Model. Object-Oriented Data Model. Object-Relational Data Model. Flat Data Model. Semi-Structured Data Model. Source: Database Systems Concepts, Avi Silberschatz, Henry F. Korth, S. Sudarshan Data Models Hierarchical Model. Object-Oriented Data Model. https://www.scaler.com/topics/dbms/data-models-in-dbms/ https://afteracademy.com/blog/what-is-data-model-in-dbms-and-what-are-its-types/ https://www.geeksforgeeks.org/basic-object-oriented-data-model/ Data Models Relational Model. The relational model uses a collection of tables to represent both data and the relationships among those data. Each table has multiple columns, and each column has a unique name. Tables are also known as relations. The relational model is an example of a record-based model. Record-based models are so named because the database is structured in fixed-format records of several types. Each table contains records of a particular type. Each record type defines a fixed number of fields, or attributes. The columns of the table correspond to the attributes of the record type. The relational data model is the most widely used data model, and a vast majority of current database systems are based on the relational model. Source: Database Systems Concepts, Avi Silberschatz, Henry F. Korth, S. Sudarshan Data Models – Relational Model Data Models Entity-Relationship Model. The entity-relationship (E-R) data model uses a collection of basic objects, called entities, and relationships among these objects. An entity is a “thing” or “object” in the real world that is distinguishable from other objects. The entity-relationship model is widely used in database Design. Source: Database Systems Concepts, Avi Silberschatz, Henry F. Korth, S. Sudarshan Data Models Entity-Relationship Model Source: Database Systems Concepts, Avi Silberschatz, Henry F. Korth, S. Sudarshan Database Language A database system provides a data-definition language (DDL) to specify the database schema and a data-manipulation language (DML) to express database queries and updates. In practice, the data-definition and data-manipulation languages are not two separate languages; instead they simply form parts of a single database language, such as the widely used SQL language. Source: Database Systems Concepts, Avi Silberschatz, Henry F. Korth, S. Sudarshan Database Language A data-manipulation language (DML) is a language that enables users to access or manipulate data as organized by the appropriate data model. The types of access are: • Retrieval of information stored in the database • Insertion of new information into the database • Deletion of information from the database • Modification of information stored in the database A query is a statement requesting the retrieval of information. Source: Database Systems Concepts, Avi Silberschatz, Henry F. Korth, S. Sudarshan Database Languages https://www.wikitechy.com/tutorials/sql/sql-ddl Relational Databases A relational database is based on the relational model and uses a collection of tables to represent both data and the relationships among those data. Most commercial relational database systems employ the SQL language. Each table has multiple columns and each column has a unique name. We present a sample relational database below comprising two tables: one shows details of university instructors and the other shows details of the various university departments. Source: Database Systems Concepts, Avi Silberschatz, Henry F. Korth, S. Sudarshan Relational Databases The relational model is an example of a record-based model. Record-based models are so named because the database is structured in fixed-format records of several types. Each table contains records of a particular type. Each record type defines a fixed number of fields, or attributes. The columns of the table correspond to the attributes of the record type. Source: Database Systems Concepts, Avi Silberschatz, Henry F. Korth, S. Sudarshan Relational Databases SQL language allows one to define tables. For instance, the following SQL statement defines the department table: create table department (dept name char (20), building char (15), budget numeric (12,2)); Execution of the above statement creates the department table with three columns: dept name, building, and budget, each of which has a specific data type associated with it. Source: Database Systems Concepts, Avi Silberschatz, Henry F. Korth, S. Sudarshan Thanks Questions/Answers
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )