Uploaded by Rohan Goswami

OptionA Databases 1.pptx

advertisement
Option A - Databases
A.1
Outline the differences between data and
information
• Data are the facts or details from which information is derived.
• Individual pieces of data are rarely useful alone.
• For data to become information, data needs to be put into context.
Meaning
Example
Data
Information
Data is raw, unorganized facts
that need to be processed. Data
can be something simple and
seemingly random and useless
until it is organized.
When data is processed,
organized, structured or
presented in a given context so
as to make it useful, it is called
information.
Each student's test score is one
piece of data.
The average score of a class or
of the entire school is
information that can be derived
from the given data.
• Data are unorganized/unstructured/unprocessed terms;
• Whereas information is organized/structured/processed data;
• Data lacks meaning e.g. on their own data elements,
• Whereas information is interpreted data and has meaning
Outline the differences between an
information system and a database
• An information system (IS) is an organized system for the collection, organization, storage and communication of information.
• More specifically, it is the study of complementary networks that people and organizations use to collect, filter, process, create and
distribute data.
• Some examples of such systems are:
•
•
•
•
•
•
•
•
data warehouses
enterprise resource planning
enterprise systems
expert systems
search engines
geographic information system
global information system
office automation.
• A computer(-based) information system is essentially an IS using computer technology to carry out some or all of its planned
tasks. The basic components of computer-based information systems are:
• Hardware- these are the devices like the monitor, processor, printer and keyboard, all of which work together to accept, process, show data and
information.
• Software- are the programs that allow the hardware to process the data.
• Databases- are the gathering of associated files or tables containing related data.
• Networks- are a connecting system that allows diverse computers to distribute resources.
• Procedures- are the commands for combining the components above to process information and produce the preferred output.
Outline the differences between an
information system and a database
• Databases are a component within an information system.
• A database is an organized collection of data.
• A relational database, more restrictively, is a collection of schemas, tables,
queries, reports, views, and other elements.
• Database designers typically organize the data to model aspects of reality in a
way that supports processes requiring information, such as (for example)
modelling the availability of rooms in hotels in a way that supports finding a hotel
with vacancies.
• A database-management system (DBMS) is a computer-software application
used to build/design a database, interacts with end-users, other applications, and
the database itself to capture and analyze data.
• A general-purpose DBMS allows the definition, creation, querying, update, and
administration of databases.
• Database is where related data of a particular activity is stored;
• Whereas the information system includes the database; Together
with other (hardware and software) components;
Discuss the need for databases
• Consider a scenario where you have some data like a data related to
customer, employee, banking etc. and you have to save the data in any type
of formats like text, images, numbers, dates, amounts, documents, audio or
video etc.
• You can store the data in text or excel spreadsheet.
• You can also use the folder structure to organize your file.
• So why do you need the database?
• Many of the small businesses continue to using Text files or Excel
spreadsheets since long time.
• This setup might works good for few small businesses, because just having
data to store does not require database.
• So only having data is not a problem here.
Discuss the need for databases
• There are some problems if database is not used:
• Size of Data: small amount of data stored in spreadsheet is manageable, however it might turn into a large
amount of data. Then Spreadsheet solution will not work. Even if the size of data records goes into millions
then storing data in multiple spreadsheet which will create a problem of speed. It will take you long time to
find a record from the multiple spreadsheet files.
• Ease of Updating Data: Multiple people cannot edit the same file at the same time. Other people must wait
until files are available to update which results in wastage of time.
• Accuracy: When user is doing data entry in files then it might be possible to enter incorrect data due to no
validation present. You can enter wrong spelling, wrong dates, and wrong amount. So the Data accuracy is
hard to maintain and accuracy is in question.
• Security: You cannot secure the data in the text files and spreadsheet. Anyone can access the file and read
any data present in the file. So storing data in spreadsheet/file will not work for banking, healthcare
application, payroll department where privacy is difficult to maintain.
• Redundancy: The duplication of data can be possible using text files or spreadsheet. Database has
mechanisms to avoid duplicate entries.
• Incomplete Data: Data items can be left out for some of the records in text files or spreadsheet whereas in
Database, you can mandate the presence of all data items.
https://www.youtube.com/watch?v=djEZeF4KTaM
•
•
•
•
•
•
•
•
•
A database engine provides standardized support for operations such as queries;
It has built-in capabilities for ensuring that transactions succeed;
Easy to sort/retrieve/analyse data;
No redundant data / no data entered twice;
This greatly reduces development time by providing a common platform that developers
can reuse in multiple projects;
Greater control;
Perform analysis;
Better information;
Costs associated with database creation
• Cost of hardware / hardware maintenance;
• Cost of software / software maintenance;
• Training cost;
Centralized Database Vs Localized Database
• Centralized Database
• Same consistent view of data across all places at any time;
• Centrally updated;
• Localized Database
• The database at various places may not be in sync.
• The values updated in one place may not be updated in another place
thereby giving inconsistent and erroneous values.
Online Database
• Advantage
• All data are listed centrally and updated;
• All data are accessible from one place for the users;
• Accessible at any time by users; Etc.
• Disadvantage
• Data could get hacked as stored online;
Describe the use of transactions, states and
updates to maintain data consistency (and
integrity)
• A transaction symbolizes a unit of work performed within a database.
• A transaction can be defined as a group of tasks. A single task is the minimum processing unit which cannot be divided further.
• Let’s take an example of a simple transaction. Suppose a bank employee transfers Rs 500 from A's account to B's account. This
very simple and small transaction involves several low-level tasks.
• A’s Account
•
•
•
•
•
Open_Account(A)
Old_Balance = A.balance
New_Balance = Old_Balance - 500
A.balance = New_Balance
Close_Account(A)
• B’s Account
•
•
•
•
•
Open_Account(B)
Old_Balance = B.balance
New_Balance = Old_Balance + 500
B.balance = New_Balance
Close_Account(B)
• To ensure data consistency when moving money between two accounts it is necessary to complete two operations (debiting one
account and crediting the other). Unless both operations are carried out successfully, the transaction will be rolled back.
• Modifications on data are made persistent in the database only if the
transaction terminates;
• A roll-back operation is performed if a failure occurs prior to termination of
the transaction;
• And this keeps the database in the original consistent status;
• For example: Transactions conform to rules before update to
database;
• Transactions must complete fully to make the changes permanent/persistent;
• Allows roll-back operation in case of failure;
• To ensure database is consistent while performing transactions;
Define the term database transaction
• The minimal step of operation/update to be performed on a database; 
That guarantees consistency/integrity of the database; And recovery upon
failure;
• A transaction is a set of changes that must all be made together.
• It is a program unit whose execution mayor may not change the contents of
a database.
• Transaction is executed as a single unit.
• If the database was in consistent state before a transaction, then after
execution of the transaction also, the database must be in a consistent
state.
• For example, a transfer of money from one bank account to another
requires two changes to the database both must succeed or fail together.
• A database transaction is a logical unit that is independently executed; For
data retrieval or updates;
• A database transaction is a unit of work; That is either executed in full or
not executed at all;
• A database transaction is a way of representing a state change; And has
four properties, known as ACID;
• A database transaction usually means a sequence of steps, treated as a
unit; For the purposes of satisfying a client’s request;
• A database transaction is a process carried out on a database; Which may
change its state, for example: moving money between bank accounts;
Explain concurrency in a data sharing
situation
• Concurrency control is a database management systems (DBMS) concept that is
used to address conflicts with the simultaneous accessing or altering of data that
can occur with a multi-user system.
• Concurrency control, when applied to a DBMS, is meant to coordinate
simultaneous transactions while preserving data integrity.
• To illustrate the concept of concurrency control, consider two travelers who go to
electronic kiosks at the same time to purchase a train ticket to the same
destination on the same train.
• There's only one seat left in the coach, but without concurrency control, it's
possible that both travelers will end up purchasing a ticket for that one seat.
• However, with concurrency control, the database wouldn't allow this to happen.
• Both travelers would still be able to access the train seating database, but
concurrency control would preserve data accuracy and allow only one traveler to
purchase the seat.
• Data sharing/replication:
• all/part of one/several source databases are shared/replicated according to the needs of
different user groups;
• The information needed by the group is made ‘closer’ to the user;
• Hence, less focus on transmission/traffic (expensive or slow or not available or not
convenient),
• but requires more storage space (but storage is cheap);
• Multiuser/concurrent access in reading/searching/querying of database;
• But records in database are partitioned, to ensure privacy in access / each user can access only
the part of the database relative to his data / Database has a row-locking mechanism, so that
access is restricted to individuals only in permitted areas / there is no conflict/competition in
accessing the same area of database;
• (This/The partition/the row locking/the partial access) ensures isolation / separation of data
(to present to transaction processing);
• Therefore, multiple transactions (from different users) may safely occur simultaneously on
non-overlapping portions of the database;
• And this simplifies the management of system updates/recovery of (separated) transactions;
Explain the importance of the ACID
properties of a database transaction
Atomicity
• By this, we mean that either the entire transaction takes place at once or doesn’t happen at all.
• There is no midway i.e. transactions do not occur partially.
• Each transaction is considered as one unit and either runs to completion or is not executed at all.
• It involves following two operations.
—Abort: If a transaction aborts, changes made to database are not visible.
—Commit: If a transaction commits, changes made are visible.
• Atomicity is also known as the ‘All or nothing rule’.
• Consider the following transaction T consisting of T1 and T2: Transfer of 100 from account X to account
Y.
• If the transaction fails after completion of T1 but before completion of T2.( say, after write(X) but
before write(Y)), then amount has been deducted from X but not added to Y.
• This results in an inconsistent database state.
• Therefore, the transaction must be executed in entirety in order to ensure correctness of database
state.
Explain the importance of the ACID
properties of a database transaction
Consistency
• This means that integrity constraints must be maintained so that the
database is consistent before and after the transaction.
• It refers to correctness of a database.
• Referring to the example above, The total amount before and after the
transaction must be maintained.
Total before T occurs = 500 + 200 = 700.
Total after T occurs = 400 + 300 = 700.
• Therefore, database is consistent.
• Inconsistency occurs in case T1 completes but T2 fails. As a result T is
incomplete.
Explain the importance of the ACID
properties of a database transaction
Isolation
• This property ensures that multiple transactions can occur concurrently without leading to inconsistency of database state.
• Transactions occur independently without interference.
• Changes occurring in a particular transaction will not be visible to any other transaction until that particular change in that
transaction is written to memory or has been committed.
• This property ensures that the execution of transactions concurrently will result in a state that is equivalent to a state achieved
these were executed serially in some order.
Let X= 500, Y = 500.
• Consider two transactions T and T”.
• Suppose T has been executed till Read (Y) and then T’’ starts.
• As a result , interleaving of operations takes place due to which T’’ reads correct value of X but incorrect value of Y and sum
computed by
T’’: (X+Y = 50, 000+500=50, 500)
is thus not consistent with the sum at end of transaction:
T: (X+Y = 50, 000 + 450 = 50, 450).
• This results in database inconsistency, due to a loss of 50 units. Hence, transactions must take place in isolation and changes
should be visible only after a they have been made to the main memory.
• It says/defines/specifies how and when the changes of a process are
visible to concurrent operations;
• For example, using locks on data to prevent concurrent writing that
would lead to inconsistency as in the case of two people booking the
same room;
Explain the importance of the ACID
properties of a database transaction
Durability:
• This property ensures that once the transaction has completed execution, the
updates and modifications to the database are stored in and written to disk and
they persist even is system failure occurs.
• These updates now become permanent and are stored in a non-volatile
memory.
• The effects of the transaction, thus, are never lost.
• The ACID properties, in totality, provide a mechanism to ensure correctness and
consistency of a database in a way such that each transaction is a group of
operations that acts a single unit, produces consistent results, acts in isolation
from other operations and updates that it makes are durably stored.
• Durability is important because transaction data changes must be
available; Even in the event of database failure;
• Durability means that if the system says the transaction has been
committed; The client does not need to worry about it because
transactions that have been committed will survive permanently;
• Durability in databases is an important property because it ensures
transactions are saved permanently; And do not accidentally
disappear or get erased;
Describe the two functions databases require
to be performed on them
• Query and Update
• A database query can be either a select query or an action query.
• A select query is a data retrieval query, while an action query asks for
additional operations on the data, such as insertion, updating or
deletion.
• Updating; Adding/deleting/modifying entities;
• Inserting; Adding new records/entities to the table;
• Deleting; Removing entities which are not in use anymore;
• Modifying; Changing information in the table for more recent
information;
Explain the role of data validation and data
verification
• Validation is making sure that the value entered is of the correct type;
• For example, the user entered a currency value to deposit and not a date;
• Data validation is checking to see if the data entered is sensible;
• So that it can be processed correctly / maintains the integrity of the data;
• It makes sure that the data entered is in the appropriate range and/or type to avoid
obtaining incorrect results;
• It is performed by the computer which detects if data entered is not in the range/of
the type which is defined (by the person who set up the database);
• Verification is getting confirmation that the value entered is the one
intended;
• For example, making the user enter the deposit amount twice and ensuring
that the entries match;
Validation types
• Type Check (float instead of integer e.g RollNo is 1.5)
• Range Check (Mark between 0 and 100)
• Format Check (dd/mm/yyyy)
• Presence Check (data empty or present)
• Character Check (No special symbols in name)
• Length Check (Max 30 characters in Name)
Download