Option A - Databases A.1 Outline the differences between data and information • Data are the facts or details from which information is derived. • Individual pieces of data are rarely useful alone. • For data to become information, data needs to be put into context. Meaning Example Data Information Data is raw, unorganized facts that need to be processed. Data can be something simple and seemingly random and useless until it is organized. When data is processed, organized, structured or presented in a given context so as to make it useful, it is called information. Each student's test score is one piece of data. The average score of a class or of the entire school is information that can be derived from the given data. • Data are unorganized/unstructured/unprocessed terms; • Whereas information is organized/structured/processed data; • Data lacks meaning e.g. on their own data elements, • Whereas information is interpreted data and has meaning Outline the differences between an information system and a database • An information system (IS) is an organized system for the collection, organization, storage and communication of information. • More specifically, it is the study of complementary networks that people and organizations use to collect, filter, process, create and distribute data. • Some examples of such systems are: • • • • • • • • data warehouses enterprise resource planning enterprise systems expert systems search engines geographic information system global information system office automation. • A computer(-based) information system is essentially an IS using computer technology to carry out some or all of its planned tasks. The basic components of computer-based information systems are: • Hardware- these are the devices like the monitor, processor, printer and keyboard, all of which work together to accept, process, show data and information. • Software- are the programs that allow the hardware to process the data. • Databases- are the gathering of associated files or tables containing related data. • Networks- are a connecting system that allows diverse computers to distribute resources. • Procedures- are the commands for combining the components above to process information and produce the preferred output. Outline the differences between an information system and a database • Databases are a component within an information system. • A database is an organized collection of data. • A relational database, more restrictively, is a collection of schemas, tables, queries, reports, views, and other elements. • Database designers typically organize the data to model aspects of reality in a way that supports processes requiring information, such as (for example) modelling the availability of rooms in hotels in a way that supports finding a hotel with vacancies. • A database-management system (DBMS) is a computer-software application used to build/design a database, interacts with end-users, other applications, and the database itself to capture and analyze data. • A general-purpose DBMS allows the definition, creation, querying, update, and administration of databases. • Database is where related data of a particular activity is stored; • Whereas the information system includes the database; Together with other (hardware and software) components; Discuss the need for databases • Consider a scenario where you have some data like a data related to customer, employee, banking etc. and you have to save the data in any type of formats like text, images, numbers, dates, amounts, documents, audio or video etc. • You can store the data in text or excel spreadsheet. • You can also use the folder structure to organize your file. • So why do you need the database? • Many of the small businesses continue to using Text files or Excel spreadsheets since long time. • This setup might works good for few small businesses, because just having data to store does not require database. • So only having data is not a problem here. Discuss the need for databases • There are some problems if database is not used: • Size of Data: small amount of data stored in spreadsheet is manageable, however it might turn into a large amount of data. Then Spreadsheet solution will not work. Even if the size of data records goes into millions then storing data in multiple spreadsheet which will create a problem of speed. It will take you long time to find a record from the multiple spreadsheet files. • Ease of Updating Data: Multiple people cannot edit the same file at the same time. Other people must wait until files are available to update which results in wastage of time. • Accuracy: When user is doing data entry in files then it might be possible to enter incorrect data due to no validation present. You can enter wrong spelling, wrong dates, and wrong amount. So the Data accuracy is hard to maintain and accuracy is in question. • Security: You cannot secure the data in the text files and spreadsheet. Anyone can access the file and read any data present in the file. So storing data in spreadsheet/file will not work for banking, healthcare application, payroll department where privacy is difficult to maintain. • Redundancy: The duplication of data can be possible using text files or spreadsheet. Database has mechanisms to avoid duplicate entries. • Incomplete Data: Data items can be left out for some of the records in text files or spreadsheet whereas in Database, you can mandate the presence of all data items. https://www.youtube.com/watch?v=djEZeF4KTaM • • • • • • • • • A database engine provides standardized support for operations such as queries; It has built-in capabilities for ensuring that transactions succeed; Easy to sort/retrieve/analyse data; No redundant data / no data entered twice; This greatly reduces development time by providing a common platform that developers can reuse in multiple projects; Greater control; Perform analysis; Better information; Costs associated with database creation • Cost of hardware / hardware maintenance; • Cost of software / software maintenance; • Training cost; Centralized Database Vs Localized Database • Centralized Database • Same consistent view of data across all places at any time; • Centrally updated; • Localized Database • The database at various places may not be in sync. • The values updated in one place may not be updated in another place thereby giving inconsistent and erroneous values. Online Database • Advantage • All data are listed centrally and updated; • All data are accessible from one place for the users; • Accessible at any time by users; Etc. • Disadvantage • Data could get hacked as stored online; Describe the use of transactions, states and updates to maintain data consistency (and integrity) • A transaction symbolizes a unit of work performed within a database. • A transaction can be defined as a group of tasks. A single task is the minimum processing unit which cannot be divided further. • Let’s take an example of a simple transaction. Suppose a bank employee transfers Rs 500 from A's account to B's account. This very simple and small transaction involves several low-level tasks. • A’s Account • • • • • Open_Account(A) Old_Balance = A.balance New_Balance = Old_Balance - 500 A.balance = New_Balance Close_Account(A) • B’s Account • • • • • Open_Account(B) Old_Balance = B.balance New_Balance = Old_Balance + 500 B.balance = New_Balance Close_Account(B) • To ensure data consistency when moving money between two accounts it is necessary to complete two operations (debiting one account and crediting the other). Unless both operations are carried out successfully, the transaction will be rolled back. • Modifications on data are made persistent in the database only if the transaction terminates; • A roll-back operation is performed if a failure occurs prior to termination of the transaction; • And this keeps the database in the original consistent status; • For example: Transactions conform to rules before update to database; • Transactions must complete fully to make the changes permanent/persistent; • Allows roll-back operation in case of failure; • To ensure database is consistent while performing transactions; Define the term database transaction • The minimal step of operation/update to be performed on a database; That guarantees consistency/integrity of the database; And recovery upon failure; • A transaction is a set of changes that must all be made together. • It is a program unit whose execution mayor may not change the contents of a database. • Transaction is executed as a single unit. • If the database was in consistent state before a transaction, then after execution of the transaction also, the database must be in a consistent state. • For example, a transfer of money from one bank account to another requires two changes to the database both must succeed or fail together. • A database transaction is a logical unit that is independently executed; For data retrieval or updates; • A database transaction is a unit of work; That is either executed in full or not executed at all; • A database transaction is a way of representing a state change; And has four properties, known as ACID; • A database transaction usually means a sequence of steps, treated as a unit; For the purposes of satisfying a client’s request; • A database transaction is a process carried out on a database; Which may change its state, for example: moving money between bank accounts; Explain concurrency in a data sharing situation • Concurrency control is a database management systems (DBMS) concept that is used to address conflicts with the simultaneous accessing or altering of data that can occur with a multi-user system. • Concurrency control, when applied to a DBMS, is meant to coordinate simultaneous transactions while preserving data integrity. • To illustrate the concept of concurrency control, consider two travelers who go to electronic kiosks at the same time to purchase a train ticket to the same destination on the same train. • There's only one seat left in the coach, but without concurrency control, it's possible that both travelers will end up purchasing a ticket for that one seat. • However, with concurrency control, the database wouldn't allow this to happen. • Both travelers would still be able to access the train seating database, but concurrency control would preserve data accuracy and allow only one traveler to purchase the seat. • Data sharing/replication: • all/part of one/several source databases are shared/replicated according to the needs of different user groups; • The information needed by the group is made ‘closer’ to the user; • Hence, less focus on transmission/traffic (expensive or slow or not available or not convenient), • but requires more storage space (but storage is cheap); • Multiuser/concurrent access in reading/searching/querying of database; • But records in database are partitioned, to ensure privacy in access / each user can access only the part of the database relative to his data / Database has a row-locking mechanism, so that access is restricted to individuals only in permitted areas / there is no conflict/competition in accessing the same area of database; • (This/The partition/the row locking/the partial access) ensures isolation / separation of data (to present to transaction processing); • Therefore, multiple transactions (from different users) may safely occur simultaneously on non-overlapping portions of the database; • And this simplifies the management of system updates/recovery of (separated) transactions; Explain the importance of the ACID properties of a database transaction Atomicity • By this, we mean that either the entire transaction takes place at once or doesn’t happen at all. • There is no midway i.e. transactions do not occur partially. • Each transaction is considered as one unit and either runs to completion or is not executed at all. • It involves following two operations. —Abort: If a transaction aborts, changes made to database are not visible. —Commit: If a transaction commits, changes made are visible. • Atomicity is also known as the ‘All or nothing rule’. • Consider the following transaction T consisting of T1 and T2: Transfer of 100 from account X to account Y. • If the transaction fails after completion of T1 but before completion of T2.( say, after write(X) but before write(Y)), then amount has been deducted from X but not added to Y. • This results in an inconsistent database state. • Therefore, the transaction must be executed in entirety in order to ensure correctness of database state. Explain the importance of the ACID properties of a database transaction Consistency • This means that integrity constraints must be maintained so that the database is consistent before and after the transaction. • It refers to correctness of a database. • Referring to the example above, The total amount before and after the transaction must be maintained. Total before T occurs = 500 + 200 = 700. Total after T occurs = 400 + 300 = 700. • Therefore, database is consistent. • Inconsistency occurs in case T1 completes but T2 fails. As a result T is incomplete. Explain the importance of the ACID properties of a database transaction Isolation • This property ensures that multiple transactions can occur concurrently without leading to inconsistency of database state. • Transactions occur independently without interference. • Changes occurring in a particular transaction will not be visible to any other transaction until that particular change in that transaction is written to memory or has been committed. • This property ensures that the execution of transactions concurrently will result in a state that is equivalent to a state achieved these were executed serially in some order. Let X= 500, Y = 500. • Consider two transactions T and T”. • Suppose T has been executed till Read (Y) and then T’’ starts. • As a result , interleaving of operations takes place due to which T’’ reads correct value of X but incorrect value of Y and sum computed by T’’: (X+Y = 50, 000+500=50, 500) is thus not consistent with the sum at end of transaction: T: (X+Y = 50, 000 + 450 = 50, 450). • This results in database inconsistency, due to a loss of 50 units. Hence, transactions must take place in isolation and changes should be visible only after a they have been made to the main memory. • It says/defines/specifies how and when the changes of a process are visible to concurrent operations; • For example, using locks on data to prevent concurrent writing that would lead to inconsistency as in the case of two people booking the same room; Explain the importance of the ACID properties of a database transaction Durability: • This property ensures that once the transaction has completed execution, the updates and modifications to the database are stored in and written to disk and they persist even is system failure occurs. • These updates now become permanent and are stored in a non-volatile memory. • The effects of the transaction, thus, are never lost. • The ACID properties, in totality, provide a mechanism to ensure correctness and consistency of a database in a way such that each transaction is a group of operations that acts a single unit, produces consistent results, acts in isolation from other operations and updates that it makes are durably stored. • Durability is important because transaction data changes must be available; Even in the event of database failure; • Durability means that if the system says the transaction has been committed; The client does not need to worry about it because transactions that have been committed will survive permanently; • Durability in databases is an important property because it ensures transactions are saved permanently; And do not accidentally disappear or get erased; Describe the two functions databases require to be performed on them • Query and Update • A database query can be either a select query or an action query. • A select query is a data retrieval query, while an action query asks for additional operations on the data, such as insertion, updating or deletion. • Updating; Adding/deleting/modifying entities; • Inserting; Adding new records/entities to the table; • Deleting; Removing entities which are not in use anymore; • Modifying; Changing information in the table for more recent information; Explain the role of data validation and data verification • Validation is making sure that the value entered is of the correct type; • For example, the user entered a currency value to deposit and not a date; • Data validation is checking to see if the data entered is sensible; • So that it can be processed correctly / maintains the integrity of the data; • It makes sure that the data entered is in the appropriate range and/or type to avoid obtaining incorrect results; • It is performed by the computer which detects if data entered is not in the range/of the type which is defined (by the person who set up the database); • Verification is getting confirmation that the value entered is the one intended; • For example, making the user enter the deposit amount twice and ensuring that the entries match; Validation types • Type Check (float instead of integer e.g RollNo is 1.5) • Range Check (Mark between 0 and 100) • Format Check (dd/mm/yyyy) • Presence Check (data empty or present) • Character Check (No special symbols in name) • Length Check (Max 30 characters in Name)