Uploaded by Ilyes Abdellatif

Intro to Web Dev: Database Concepts & Architecture

advertisement
“Intro to Web
Dev” Course
DB Part I
Instructors
AbdelAziz Allam
abd.ibrahim.allam@gmail.com
Solutions Development Manager
15 years of experience in Software industry.
Mahmoud Shahin
Mahmoudshahin.it@gmail.com
Principle Solutions Architect
15 years of experience in Software industry.
Module 04
04 DB Part I
-
The concept.
CAP.
DB 360 view.
DB internal.
Content
Key outcome:
Reminders
The
concept
DB 360
view
DB
internals
• Difference between
DB types.
• Internal architecture.
Reminders
Reminder 01 - Scaling
Horizontal
scaling (scaling out) by
adding more instances
Vertically (scaling up) by
adding more HW
resources or moving the
app to a larger, more
powerful machine.
Reminder 02 – Sharding/ Partitioning
Sharding involves splitting
and distributing
data/events across
multiple servers.
Shards are stored on
multiple machines.
This allows for larger
datasets to be split into
smaller chunks and stored
in multiple data nodes,
Reminder 03 - Replication
Sample for a simple use case
Replication
Replication
introduces
Replication increases
read performance
complexity on writeReplication increases either through load
focused workloads,
availability.
balancing or through as each write must
geo-located query
be copied to every or
routing.
some replicated
node(s).
Reminder 03 - Replication
Sample for another use case – RF= 3
Replication
Replication factor
describes how
many copies of
your data exist.
Replication factor of
3 as an example,
when you write, two
copies will always
be stored,
assuming enough
nodes are up.
When a node is
down, writes for
that node are
stashed away and
written when it
comes back up
[#homework: do a
research until we
take it during the
course!]
Consistency
Nodes will have the
same copies of a
replicated data item.
The same response
is given to all
identical requests.
Accuracy,
completeness,
correctness and
reliability of data.
All reads receive the
most recent write.
A guarantee that
every node in a
distributed cluster
returns the same
(most recent).
Distributed System Note
A distributed system is a system with multiple
components located on different machines that
communicate and coordinate actions in order to
appear as a single logical system to the end-user.
CAP theorem
!‫ﻣﺣدش ﺑﯾﺎﺧد ﻛل ﺣﺎﺟﺔ‬
• In distributed systems as data is stored across multiple nodes or servers to ensure fault tolerance, scalability,
and reliability.
• The theorem states that in a distributed system, you can achieve at most two out of the three key properties:
Consistency, Availability, and Partition Tolerance.
CAP Overview I
It gives Insights into the trade-offs that system architects must make when designing distributed systems.
Refrences:
• https://www.ibm.com/topics/cap-theorem
• https://www.geeksforgeeks.org/the-cap-theorem-in-dbms/
• https://medium.com/@gurpreet.singh_89/understanding-the-cap-theorem-consistency-availability-and-partition-tolerance-e7faa5103638
CAP Overview II
Consistency
• Nodes will have the same copies of a replicated data item visible for various transactions.
• A guarantee that every node in a distributed cluster returns the same, most recent and a
successful write. No matter which node they connect to.
• Whenever data is written to one node, it must be instantly forwarded or replicated to all the
other nodes in the system before the write is deemed ‘successful.’
Availability
• Any client making a request for data gets a response, even if one or more nodes are down.
• All working nodes in the distributed system return a valid response for any request
Partition
tolerance
Refrences:
• https://www.ibm.com/topics/cap-theorem
• The system can continue operating even if the network connecting the nodes has a fault that
results in two or more partitions.
• Partition tolerance means that the cluster must continue to work despite any number of
communication breakdowns between nodes in the system.
CAP Overview III
Refrences:
CA
• A system that prioritizes Consistency and Availability (CA) aims to provide strong consistency and
high availability.
• However, in this scenario, the system might need to sacrifice Partition Tolerance. When a network
partition occurs, the system might become unavailable or operate in a limited capacity to ensure
data consistency.
CP
• A system that emphasizes Consistency and Partition Tolerance (CP) focuses on maintaining
strong consistency even in the presence of network partitions.
• This approach might lead to reduced availability during partitioned scenarios, as some nodes
might not be reachable (ie: banking operations).
AP
• A system that values Availability and Partition Tolerance (AP) aims to remain operational despite
network partitions, prioritizing high availability.
• In this case, data consistency might be compromised, as different nodes could have varying data
states during partitioned periods (ie: new post in social media/ comments/ .. Etc).
https://medium.com/@gurpreet.singh_89/understanding-the-cap-theorem-consistency-availability-and-partition-tolerance-e7faa5103638
CAP Overview IV
Yalla ne3mil DB
DB Overview I
The primary job of
any database
management system is
reliably storing data and
making it available for
users.
We use databases as a
primary source of data,
helping us to share it
between the different
parts of our
applications.
DB Overview II
Create, update, delete, and
retrieve records.
Database management
systems are applications built
on top of storage engines,
offering a query language,
indexing, transactions, and
many other useful features.
Every database system has
strengths and weaknesses. you
can invest some time before
you decide on a specific
database to build confidence in
its ability to meet your
application’s needs.
Your choice of database system may have long-term consequences.
If there’s a chance that a database is not a good fit because of
performance problems, consistency issues, or operational
challenges, it is better to find out about it earlier in the development
cycle, since it can be nontrivial to migrate to a different system.
Yalla ne3mil new DBMS 01
Data Model & Storage
Decide on the data model your
DBMS will support—relational,
document-oriented, column
store, key-value, graph, etc.
Determine how data will be
stored—on disk, in-memory, or
a combination. Design the
storage format and consider
the trade-offs between different
storage mechanisms (rowbased, column-based, .. etc).
Yalla ne3mil new DBMS 02
Storage Engine
Select the storage engine
you plan to build on the
top of it.
Understand its APIs, data
structures, and how it
interacts with storage
(disk/memory).
DBMS can use the DB
engine features
(Replication, isolation,
ACID .. etc)
Integrate your higherlevel database engine
logic with the underlying
storage engine, ensuring
seamless interaction and
data retrieval/storage.
Yalla ne3mil new DBMS 03
Data structure & Query processing
Define how data will be
represented, how queries will be
processed, and what
functionalities will be provided
(e.g., indexing, transactions).
Develop the logic and
algorithms for managing data
structures (e.g., B-trees, hash
maps) and processing queries.
This involves parsing queries,
optimizing them, and executing
them against the storage
engine.
Manage parsers, optimizers,
and query executors to
efficiently process and retrieve
data based on user queries.
Yalla ne3mil new DBMS 04
Concurrency Control and Transactions
Implement mechanisms for
concurrency control.
Ensure that multiple users
accessing the database
concurrently maintain data
consistency and integrity.
Manage concurrent access to
data and ensure
transactional consistency
(ACID properties) when
multiple users interact with
the database simultaneously.
Ensure that multiple users
accessing the same data
simultaneously do not
interfere with each other's
transactions.
If the underlying storage
engine supports transactions,
ensure proper handling and
support in your database
engine layer.
Yalla ne3mil new DBMS 05
Buffer Pool / caching
Allocating a pool of
memory “Shared/Buffer
Pool)
Pages the read/fetched
from disk are placed in
the buffer pool.
It's a dedicated area in
memory where the
database engine
temporarily stores
frequently accessed
data pages from disk.
A buffer pool is a critical
component responsible
for managing and
caching data pages in
memory.
This storage in memory
enables quicker access
to frequently accessed
data, reducing the need
to repeatedly read from
slower disk storage.
Yalla ne3mil new DBMS 06
Error Handling and Logging
Implement error handling
mechanisms and logging
functionalities to ensure
proper handling of errors
and maintaining a log for
debugging and recovery
purposes.
Yalla ne3mil new DBMS 07
Security and Access Control
Ensure robust security
measures, including
authentication,
authorization, and
encryption, to protect
sensitive data stored in the
database.
Yalla ne3mil new DBMS 08
Scalability
Design the system to scale
horizontally or vertically as
data volume increases.
Consider optimizations for
performance, such as
caching, parallel processing,
and query optimization.
First step can be picking up
appropriate storage engine(s)
It is like an interface for those who
want to implement (not an interface
but just to make it simple)
Definetely we won’t re-invent the
wheel. We just try to let you think
about it!!
DB 360 view
Major Categories
• Some sources group
DBMSs into three major
categories:
• Online transaction
processing (OLTP)
databases.
• Online analytical
processing (OLAP)
databases
• Hybrid transactional and
analytical processing
(HTAP)
OLTP
• These handle a large number of user-facing requests and transactions.
Queries are often predefined and short-lived.
• High-speed data processing and rapid transaction execution in real-time.
• relational databases are OLTP databases. They organize data in tables
consisting of rows and columns. Also, both MongoDB & Cassandra are
OLTP.
• e-commerce, online banking, bookings, inventory management, and more.
OLAP
• These handle complex aggregations.
• OLAP databases are often used for analytics and data warehousing, and
are capable of handling complex, long-running ad hoc queries.
• Hadoop, MapReduce are good examples.
HTAP
• These databases combine properties of both OLTP and OLAP stores (It
breaks the wall” between OLTP and OLAP).
• Hybrid transaction/analytics processing, combines transactions, such as
updating a database, with analytics, such as finding likely sales
prospects.
OLTP & OLAP
• OLTP is said to be more of an online
transactional system or data storage
system, where the user does lots of online
transactions using the data store. It is also
said to have more ad-hoc reads/writes
happening on real time basis.
• In OLTP, there is less number of writes,
e.g. Hotel Information. In such a scenario,
there can be 1 write per second but reads
could reach to hundreds and thousands.
So the ratio here can be around 1:1000.
• OLAP is more of an offline data store. It is
accessed number of times in offline fashion.
For example, Bulk log files are read and then
written back to data files. Some of the
common areas where OLAP is used are Log
Jobs, Data mining Jobs, etc.
• There are several writes happening
simultaneously. In OLAP, we dump data in
one shot i.e., all log files are put into data
store and then we start processing. The
data pattern or access pattern is exactly the
opposite of OLTP kind of application. Here,
the Hadoop or MapReduce will be useful.
References:
https://www.edureka.co/blog/oltp-vs-olap/#:~:text=Some%20of%20the%20common%20areas,for%20analytics%20and%20bulk%20writes.
DB engines classifications 01 Sample
Relational
NonRelational
Postgres.
MongoDB.
Oracle.
MySQL.
Cassandra.
Caching
Timeseries
Redis.
Prometheus.
Memcached.
InfluxDB.
Couchbase.
Azure Cosmos DB.
Microsoft SQL server.
MariaDB.
Amazon Aurora.
Amazon DynamoDB.
Elasticsearch.
Neo4j
DB engines classifications 02 Sample
key-value
stores
Relational
databases
Documentoriented
stores
Graph
databases
Oracle.
Cassandra
MySQL.
MongoDB.
Microsoft SQL server.
Redis
MariaDB.
Amazon Aurora.
Neo4j.
Couchbase.
Latency numbers
References:
• https://blog.bytebytego.com/p/ep22-latency-numbers-you-should-know
Memory VS Disk
In-memory database • Store data primarily in memory and
management
use the disk for recovery and
systems
logging.
Disk-based DBMS
• Hold most of the data on disk and
use memory for caching disk
contents or as a temporary storage.
Both types of systems use the disk to a certain extent, but main memory databases store their contents
almost exclusively in RAM. Also, memory is faster than accessing disk.
Row VS Column stores/DBs
• Tables can be partitioned either
• Vertically (storing values belonging to the same column together), (a) shows the values partitioned
column-wise.
• Horizontally (storing values belonging to the same row together), (b) shows the values partitioned
row-wise.
Column
Row
• Instead of storing it in rows, values for the same column are stored
contiguously on disk.
• Store data in records or rows. Their layout is quite close to the tabular
data representation.
• Every row has the same set of fields.
Row
Column
CRUD
C
• Create
R
• Read
U
• Update
D
• Delete
CRUD & HTTP Methods
C
R
U
POST
GET
PUT
PATCH
D
Delete
DDL & DML
Data Definition Language (DDL)
• Create and modify the structure of database objects in the
database.
• Create, modify, and delete database structures but not
data.
Data Manipulation Language (DML)
• Insert, update, Delete.
SQL & NoSQL Reminder
SQL
Data is organized into tables, which are made
up of rows and columns.
Each row represents a record, while each
column represents a data attribute.
Handling structured data and complex
relationships between tables using foreign keys.
MySQL, PostgreSQL, and Oracle.
NoSQL
Offer a schema-less approach, , allowing for
easy adaptation to changing data requirements.
They can handle unstructured or semistructured data, making them ideal for big data
applications and real-time data processing.
MongoDB, Cassandra, and Redis.
References:
•
https://medium.com/@venkatramankannantech/a-comprehensive-guide-to-database-internals-37c8d9ed2407
DBMS
components
DB
Internals
Part I
Storage
Engines
ACID
Transactions
DBMS main parts
Databases are modular systems and consist of
multiple parts:
A query
processor
An execution
A transport
determining the
engine
layer accepting
most efficient carrying out the
requests.
way to run
operations
queries
A storage
engine storing,
retrieving, and
managing data
in memory and
on disk.
References:
•
Database Internals book – Oreilly
DBMS main parts (summarized)
Transport
layer.
Query
processor.
Execution
engine.
Storage
engine.
References:
•
Database Internals book – Oreilly
Overview
Database management systems use a client/server
model, where database system instances (nodes) take the
role of servers, and application instances take the role of
clients.
Client requests arrive through the transport subsystem.
Requests come in the form of queries, most often
expressed in some query language. The transport
subsystem is also responsible for communication with
other nodes in the database cluster.
Upon receipt, the transport subsystem hands the query
over to a query processor, which parses, interprets, and
validates it.
The parsed query is passed to the query optimizer, which
first eliminates impossible and redundant parts of the
query, and then attempts to find the most efficient way to
execute it based on internal statistics
Overview
The query is usually presented in the form of an execution
plan (or query plan): a sequence of operations that have
to be carried out for its results to be considered complete.
Since the same query can be satisfied using different
execution plans that can vary in efficiency, the optimizer
picks the best available plan.
The execution plan is carried out by the execution engine.
DBMS sample workflow
Upon receipt, the transport subsystem hands the query over
to a query processor, which parses, interprets, and validates
it.
The parsed query is passed to the query optimizer, which first eliminates
impossible and redundant parts of the query, and then attempts to find the most
efficient way to execute it based on internal statistics (index cardinality,
approximate intersection size, etc.) and data placement (which nodes in the
cluster hold the data and the costs associated with its transfer).
The query is usually presented in the
form of an execution plan (or query
plan): a sequence of operations that
have to be carried out for its results to
be considered complete.
Since the same query can be satisfied
using different execution plans that
can vary in efficiency, the optimizer
picks the best available plan.
The optimizer handles operations
required for query resolution, usually
presented as a dependency tree, and
optimizations, such as index ordering,
cardinality estimation, and choosing
access methods.
The execution plan is carried out by
the execution engine, which
aggregates the results of local and
remote operations.
DB Storage Engine
The storage engine has several
components with dedicated responsibilities:
Transaction Manager: This manager schedules transactions and
ensures they cannot leave the database in a logically inconsistent state.
Access Methods (Storage
structures): These manage
access and organizing data on
disk. Access methods include
heap files and storage structures
such as B-Trees.
Buffer Manager: This manager
caches data pages in memory.
Lock Manager: This manager
locks on the database objects for
the running transactions, ensuring
that concurrent operations do not
violate physical data integrity.
Recovery Manager:
This manager maintains the
operation log and restoring the
system state in case of a failure.
Storage Engines
• Storage Engine sometimes called
• Embeded databases.
• Software libraries that DBMSs use to do
low-level storing of data on disk.
• Storage engine is a crucial component
responsible for managing how data is stored,
retrieved, and manipulated within a database
management system (DBMS).
• It essentially handles the low-level details of
how data is stored on disk or in memory
References:
•
Database Internals book – Oreilly
• Storage engines such
as BerkeleyDB, LevelDB, RocksDB, LMDB,
libmdbx, Sophia, HaloDB, InnoDB, MyISAM,
Aria and many others were developed
independently from the database management
systems.
• Using pluggable storage engines has
enabled database developers to bootstrap
database systems using existing storage
engines and concentrate on the other
subsystems.
DBMS & Storage Engines – high level view
DB engine
Storage
engine
Notes
MongoDB
WiredTiger
It provides concurrency control, compression, and support for both read and
write-heavy workloads.
Cassanda
RocksDB
Optimized for fast storage, it is a key value sorage designed for fast
retreival. It can handle write-heavy workloads and scenarios requiring highspeed ingestion of data.
InnoDB
ACID compliance.
Support transactions.
Row level locking.
A reliable choice for general-purpose use. It offers support for transactions,
crash recovery, and foreign keys, making it suitable for OLTP (Online
Transaction Processing) workloads.
MySQL
MariaDB
Oracle
BerkeleyDB
Couchbase
Couchbase
Offers distributed, NoSQL document database designed for high
performance read and writes operations.
DBMS & MultipleStorage Engines Example
Several database management systems (DBMS) offer support for multiple
storage engines, allowing users to choose different underlying mechanisms
for storing and managing data based on their specific needs.
Couchbase
MariaDB & MySQL
• It allows users to choose between two storage • They support multiple storage engines.
engines: ForestDB, Couchstore and Magma.
MySQL historically offered various engines
These engines have different characteristics
such as InnoDB (transactional), MyISAM
related to performance, scalability, and
(non-transactional), Memory, etc.
features, allowing users to optimize for
• MariaDB continued this support and
specific requirements.
introduced additional engines like Aria and
TokuDB.
References:
•
https://mariadb.com/docs/server/storage-engines/
•
https://mariadb.com/kb/en/tokudb/
•
https://docs.couchbase.com/cloud/clusters/data-service/storage-engines.html
•
https://docs.couchbase.com/server/current/learn/buckets-memory-and-storage/storageengines.html#:~:text=Couchbase%20supports%20two%20different%20backend,best%20suited%20to%20your%20requirements.
DBMS & Storage Engines – high
level view
DB engine
Storage
engine
Notes
Postgres
#HomeWork
Do a research
Microsoft SQL Server
#HomeWork
Do a research
MyISAM VS InnoDB
Do a research about the difference between them?
• Don't forget to highlight why the default of MySQL changed from
MyISAM to InnoDB.
Try both practically in Mysql
• Use docker for quick MySQL spinning:
• “docker run --name training-mysql -p 13306:3306 -e MYSQL_ROOT_PASSWORD=mysqlpwd mysql”
MySQL assignment hint 01
MySQL assignment hint 02
MySQL assignment hint 03
MySQL assignment hint 04
DBs internals – Storage Engine –
Data structure
• Storage Engine:
• Manages how data is stored, accessed, and manipulated internally.
• It is reponsible for
• Organizing data on disk.
• Handling querie.
• Ensuring data integrity.
• Each storage engine has its own way of storing and rereiving data utilizing different algorithms &
data structures.
• Some use B-trees.
• Others might use hash tables or other specialized data structure.
DBs internals – Storage Engine indexing
•
•
•
•
Storage engines manage indexes.
Index is important for fast data retrieval.
Indexes organize data in a way that allows the database to locate information efficiently.
Index can be organized using B-Trees, hash indexes, .. Etc.
DBs internals – Storage –
concurrency control
• Storage engines implement mechanisms to handle multiple users accessing the DB simultaneously.
• They ensure data integrity by managing locks, transactions, and isolation level.
• They provide support for transactions with ACID properties (Atomicity, Consistency, Isolation,
Durability).
Do a research about Optimistic VS Pessimistic concurrency control & include it into your presentation
DBs internals – Storage – Caching &
buffer
• Storage engines often use caching mechanism to improve the performance.
• They store frequently accessed data in memory buffers reducing the need to fetch data from disk
repeatedly.
DBs internals – Storage –
Optimization
• They optimize query execution by decising how to retrieve and manipulate data efficiently.
• This includes query parsing, optimization, and execution plans
DBs internals – Storage – File
management
• They handle
• How data are stored in files on the disk.
• Allocating space.
• Managing reads and writes efficiently.
Read this URL and see how they
think
https://www.couchbase.com/blog/magma-next-gen-document-storage-engine/
Ensure you understand the concept
of “pluggable storage engine”
• What is the benefit of it?
• How to switch between them (if needed)?
Do a research about real world example
and which DB engine they use!
Real world product
Discord
X (AKA Twitter)
Meta (AKA Facebook)
Instagram
Spotify
Netflix
TikTok
Ebay
Walmart
Airbnb
Uber
Used DB engines
Why
Benefits
Drawbacks
Instructors’ expectations
While doing the previous slide’s assingment,
• If you found yourself checking why some product switched from DB-X to DB-Y, then you are moving in he
right direction.
•
•
•
•
•
https://discord.com/blog/how-discord-stores-billions-of-messages
https://hackernoon.com/discord-went-from-mongodb-to-cassandra-then-scylladb-why
Discord Journey from MongoDB
https://www.uber.com/en-SA/blog/postgres-to-mysql-migration/
goDB
Read the above link if you did not hit them before !!!
Transactions
It is a sequence of multiple
operations performed on a
database, and all served as
a single logical unit of work
— taking place wholly or
not at all.
Begin
References:
• https://fauna.com/blog/database-transaction
There’s never a case
that only half/part of the
operations are performed
and the results saved.
Commit
Collection of queries – one
unit of work
Rollback
Example of a transaction in action
Consider a banking app where a user wishes to transfer funds from one account to another, where the
operation’s transaction might look like this:
•
•
•
•
•
BEGIN TRANSACTION – An example of transferring money from BankAccount-A to BankAccount-B
Deduct the transfer amount from the source account (BankAccount-A).
Add the transfer amount to the destination account (BankAccount-B).
COMMIT – Updating the record of the transaction carried out by the customer.
The transaction is rolled back, and the database is restored to its initial state if any of its operations fail,
such as
• If the something wrong happened when adding balance to BankAccount-B. DB went down as an
example.
ACID
ACID is an acronym of
the properties used to
indicate that the
database engine
provides atomicity, cons
istency, isolation,
and durability.
References:
• Learn PostgreSQL
Atomicity
• means that a complex database operation is
processed as a single instruction even when it is
made up of different operations.
Consistency
• means that the data within the database is always
kept consistent and that is it is not corrupted due
to partially performed operations.
Isolation
• allows the database to handle concurrency in the
"right way"—that is, without having corrupted data
from interleaved changes.
Durability
• means that the database engine is supposed to
protect the data it contains, even in the case of
software and hardware failures, as much as it can.
Atomicity
• Atomicity in terms of a transaction means all or nothing. When a transaction is committed, the
database either completes the transaction successfully or rolls it back so that the database returns
to its original state.
• For example, in an online ticketing application, a booking may consist of two separate actions that
form a transaction — reserving the seat for the customer and paying for the seat. A transaction
guarantees that when a booking is completed, both these actions, although independent, happen
within the same transaction. If any of the actions fail, the entire transaction is rolled back, and the
booking is freed up for another transaction attempting to take it.
References:
• https://fauna.com/blog/database-transaction
Consistency
• Ensures that transactions only make changes to tables in predefined, predictable ways.
• The data must be consistent before & after the transaction.
References:
• https://fauna.com/blog/database-transaction
Isolation
• With multiple concurrent transactions running at the same time, each transaction should be kept
independent without affecting other transactions executing simultaneously.
• Transactions are instead run in parallel, and some form of database locking is utilized to ensure that the
result of one transaction does not impact that of another.
References:
• https://fauna.com/blog/database-transaction
Read phenomena
Dirty read
Non- repeatable read
Phantom read
• Happens when a
transaction reads
data written by other
concurrent
transaction that has
not committed yet.
• We don’t know if
that another
transaction will be
committed or rolled
back.
• We might end up
using incorrect data
if rollback
happened..
• When a transaction
reads the same
record twice and
see different values.
• This is because row
has been modified
by other
transactions that
was committed after
the first read.
• The same query reexecuted but a
different set of rows
returned.
• This can be due
some changes
made by other
recently committed
transactions, such
as inserting new
rows or deleing
existing rows which
satisfy the search
condition of current
transaction queries.
Serialization anomaly
• #Homework
Dirty Read
Dirty read is the state of
reading uncommitted
data.
We are not sure about
the consistency of the
data that is read.
This is as we don’t know
the result of the open
transaction(s).
After reading the
uncommitted data, the
open transaction can be
completed either with
rollback or committed
successfully.
References:
• https://www.sqlshack.com/dirty-reads-and-the-readuncommitted-isolation-level/
Non-Repeatable read
Happens when one
transaction reads the same
data twice while another
transaction updates that
data in between the first
and second read of the first
transaction.
References:
• https://dotnettutorials.net/lesson/non-repeatable-readconcurrency-problem/
Phantom read
When one transaction executes
a query twice and it gets a
different number of rows in the
result set.
This generally happens when a
second transaction inserts some
new rows in between the first
and second query execution of
the first transaction that
matches the WHERE clause of
the query executed by the first
transaction.
References:
• https://dotnettutorials.net/lesson/phantom-read-concurrencyproblem-sql-server/
• Read, understand, learn about
“Serialization anamoly that we
skipped” and other read
phenomena.
• Support your presentation with
Examples.
Isolation level
Read uncommitted
Read committed
Repeatable read
Serializable
• Transactions in this
level can see data
written by other
uncommitted
transactions, thus
allowing phenomenon
to happen.
• NO ISOLATION.
• Transactions can only
see data that has been
committed by other
transactions.
• Because of this, dirty
read is no longer
possible.
• Each query sees
committed values.
• More strict isolation
level.
• It ensures that the
same select query will
always return the same
result (sees
committed updates at
the beginning of the
transactions).
• This even if some
other concurrent
transactions have
committed new
changes that satisfy
the query.
• The highest isolation
level.
• Concurrent
transactions running in
this level are
guaranteed to be able
to yield the same result
as if they’re executed
sequentially in some
order, one after
another without
overlapping.
References:
• https://dev.to/techschoolguru/understand-isolation-levels-read-phenomena-in-mysql-postgres-c2e
MySQL isolation level from our
container
Check transaction isolation
Change transaction isolation
Try different isolation level with
different CRUD operations!
MySQL will be a good option to
try.
Make two transactions and play normally following the below URL:
https://dev.to/techschoolguru/understand-isolation-levels-read-phenomena-in-mysql-postgres-c2e
Let us recap it well!
Yes = May accur.
References:
• https://dotnettutorials.net/lesson/phantom-read-concurrency-problem-sql-server/
Durability
• Durability means that a successful transaction commit will survive permanently. To accomplish this, an
entry is added to the database transaction log for each successful transaction.
• Changes made by committed transactions should be persisted in a durable storage.
• The changes of successful transaction occur even if the system failure occurred.
• Durability popular techniques:
• WAL (Write Ahead Log).
• Append Only File (AOF).
• Asynchronous snapshot.
References:
• https://fauna.com/blog/database-transaction
ACID Summarized
References:
• https://www.bmc.com/blogs/acid-atomic-consistent-isolated-durable/
ACID Summarized
• ACID transactions ensure the highest possible data reliability and integrity.
• They ensure that your data never falls into an inconsistent state because of an
operation that only partially completes.
• For example, without ACID transactions, if you were writing some data to a
database table, but the power went out unexpectedly, it's possible that only some
of your data would have been saved, while some of it would not.
• Now your database is in an inconsistent state that is very difficult and timeconsuming to recover from.
DB Internals
Part II
Pages
DB
Internals
Part II
B-Tree
WAL
DBs internals – Pages
• Databases often use fixed-size pages to store data. Tables, collections, rows, columns, indexes,
sequences, documents and more eventually end up as bytes in a page.
• Databases read and write in pages. When you read a row from a table, the database finds the page
where the row lives and identifies the file and offset where the page is located on disk.
• The database then asks the OS to read from the file on the particular offset for the length of the page.
• The OS checks its filesystem cache and if the required data isn’t there, the OS issues the read and
pulls the page in memory for the database to consume.
• The smaller the rows, the more rows fit in a single page.
References:
•
https://medium.com/p/38cdb2c79eb5
Pages
• MSSQL Server:
• The disk space allocated to a data in a database is
logically divided into pages numbered contiguously
from 0 to n. Disk I/O operations are performed at
the page level.
• SQL Server reads or writes whole data pages.
• All data pages are the same size: 8 KB. This is
similar to Oracle as well (page size is 8 KB).
• The index pages contain index references about
where the data is.
• There are system pages that store various
metadata about the organization of the data.
References:
•
•
https://medium.com/p/38cdb2c79eb5
https://learn.microsoft.com/en-us/sql/relational-databases/pages-andextents-architecture-guide?view=sql-server-ver16
Pages
• MSSQL Server:
• Data rows are stored on the page serially.
• Each row offset table contains one entry for each
row on the page.
References:
•
https://learn.microsoft.com/en-us/sql/relational-databases/pages-andextents-architecture-guide?view=sql-server-ver16
Pages with Insert, Update, Delete & Select
• Update:
• When a user updates a row, the database finds the page where the
row lives, pull the page in the buffer pool and update the row in
memory and make an entry of change (WAL) persisted to disk.
• The page can remain in memory so it may receive more writes before
it is finally flushed back to disk (minimizing the number of I/Os)
• Insert:
• When a user inserts a row, the database … #Homework
• Delete:
• When a user delete row(s), the database … #Homework
• Select: #Homework
• When a user select row(s), the database …
• With index: …
• Without index: …
Pages – more details 01
• When a user updates a row,
• the database finds the page where the row lives,
• pull the page in the buffer pool and update the row in memory and make a journal entry of the change
(often called WAL) persisted to disk.
• The page can remain in memory so it may receive more writes before it is finally flushed back to disk,
minimizing the number of I/Os.
• Deletes and inserts work the same but implementation may vary.
• Row-store databases write rows and all their attributes one after the other packed in the page so that OLTP
workloads are better especially write workload.
• Column-store databases write the rows in pages column by column such OLAP workloads that run a
summary fewer fields are more efficient.
• Document based databases compress documents and store them in page just like row stores and graph
based databases persist the connectivity in pages such that page read is efficient for traversing graphs, this
also can be tuned for depth vs breadth vs search.
Pages – more details 02
• Whether you are storing rows, columns, documents or graphs, the goal is to pack your items in
the page such that a page read is effective. The page should give you as much useful information
as possible to help with client-side workload. If you find yourself reading many pages to do tiny
little work, consider rethinking your data modeling.
• Each database has a different implementation of how the page looks like and how it is physically
stored on disk but at the end, the concept is the same.
• Small pages are faster to read and write. However, the overhead cost of the page header
metadata compared to useful data can be higher.
• Lager sizes can minimize metadata overhead and page splits but at a cost of higher cold read
and write.
B-tree
• One of the most popular storage structures is a B-Tree.
• Many open source database systems are B-Tree based, and over the years they’ve proven to cover the
majority of use cases.
• B-trees are balanced trees, ensuring that the distance from the root node to any leaf node is roughly the
same. This balance helps in maintaining consistent search, insert, and delete performance regardless of the
size of the tree.
WAL I
Changes to data to be written to a log before the corresponding data is updated
in the main storage. The idea is that before any modifications are made to the
actual database or file system, a record of these changes is written to a log file.
Once the log entry is
successfully written to
the disk, the
corresponding changes
can be applied to the
main storage.
It is an append-only
mechanism and
ensures data integrity.
By writing the changes
to the log first, the
system ensures that
Write-ahead logging is
even if a crash occurs
often used in
or power is lost, the
transactional systems to
modifications are not
maintain atomicity.
lost. During recovery,
Atomicity ensures that a
the system can use the
transaction is treated as
log to bring the data
a single
back to a consistent
state by replaying the
logged changes.
If any part of the
transaction fails, the
entire transaction is
rolled back, and the
database remains in a
consistent state.
References:
•
https://medium.com/@venkatramankannantech/a-comprehensive-guide-to-database-internals-37c8d9ed2407
WAL II
Start of
Transaction
When a transaction
begins, the system
creates a new log
entry to mark the
start of the
transaction.
Modifications
Commit
Apply
Changes
As the transaction
progresses and data
is modified or
updated, the
changes are
recorded in the log
file.
When the
transaction is ready
to be committed, a
special log entry
called “commit
record” is written to
the log, indicating
that all changes
associated with the
transaction are now
considered durable.
After the commit
record is
successfully written
to the log, the
changes are applied
to the main
database or file
system. This
ensures that data
remains consistent
References:
•
https://medium.com/@venkatramankannantech/a-comprehensive-guide-to-database-internals-37c8d9ed2407
Summary
We have taken:
1. DBMS 360 view (SQL, NoSQL, Row, Columnar, document, key-value, graph)
2. Internals including Storage engines, B-Tree, WAL, Pages, ACID.
3. CAP theorem understanding.
Session
Conclusion
Assignment
Reading
•
•
•
•
•
1
Storage engines.
B-Tree.
LSM.
WAL.
Pages in DB engines
Hands on
• Follow what is in Homework slides.
2
Resources
• Add useful resources to our Knowledge base.
3
4
Questions
• Add your valid questions to the github issues.
Thank You
Remember, Do your best!!
No Excuse ..
Download