1 DBMS Notes GIET2013EC4SA DataBase : A database is a collection of stored operational data used by various applications and/or users by some particular enterprise or by a set of outside authorized applications and authorized users. DataBase Management System : A DataBase Management System (DBMS) is a software system that manages execution of users applications to access and modify database data so that the data security, data integrity, and data reliability is guaranteed for each application and each application is written with an assumption that it is the only application active in the database. What Is Data ? •Different viewpoints: –A sequence of characters stored in computer memory or storage –Interpreted sequence of characters stored in computer memory or storage –Interpreted set of objects – Database supports a concurrent access to the data File Systems : •File is uninterrupted, unstructured collection of information •File operations: delete, catalog, create, rename, open, close, read, write, find, … •Access methods: Algorithms to implement operations along with internal file organization •Examples: File of Customers, File of Students; Access method: implementation of a set of operations on a file of students or customers. File Management System Problems: •Data redundancy •Data Access: New request-new program •Data is not isolated from the access implementation •Concurrent program execution on the same file •Difficulties with security enforcement •Integrity issues. Database Applications: •Airline Reservation Systems – Data items are: single passenger reservations; Information about flights and airports; Information about ticket prices and tickets restrictions. •Banking Systems – Data items are accounts, customers, loans, mortgages, balances, etc. Failures are not tolerable. Concurrent access must be provided •Corporate Records – Data items are: sales, accounts, bill of materials records, employee and their dependents ADVANTAGES OF A DBMS: Data independence: Application programs should be as independent as possible from details of data representation and storage. The DBMS can provide an abstract view of the data to insulate application code from such details. Client data access: A DBMS utilizes a variety of sophisticated techniques to store and retrieve data 2 DBMS Notes GIET2013EC4SA efficiently. This feature is especially important if the data is stored on external storage devices. Data integrity and security: If data is always accessed through the DBMS, the DBMS can enforce integrity constraints on the data. For example, before inserting salary information for an employee, the DBMS can check that the department budget is not exceeded. Also, the DBMS can enforce access controls that govern what data is visible to deferent classes of users. Data administration: When several users share the data, centralizing the administration of data can or signi cant improvements. Experienced professionals who understand the nature of the data being managed, and how deferent groups of users use it, can be responsible for organizing the data representation to minimize redundancy and for ne-tuning the storage of the data to make retrieval efficient. concarence recovery: A DBMS schedules concurrent accesses to the data in such a manner that users can think of the data as being accessed by only one user at a time. Further, the DBMS protects users from the eects of system failures. Reduced application development time: Clearly, the DBMS supports many important functions that are common to many applications accessing data stored in the DBMS. This, in conjunction with the high-level interface to the data, facilitates quick development of applications. Such applications are also likely to be more robust than applications developed from scratch because many important tasks are handled by the DBMS instead of being implemented by the application. Data Levels and their Roles : •Physical – corresponds to the first view of data: How data is stored, how is it accessed, how data is modified, is data ordered, how data is allocated to computer memory and/or peripheral devices, how data items are actually represented (ASCI, EBCDIC,…) .The physical schema species additional storage details. Essentially, the physical schema summarizes how the relations described in the conceptual schema are actually stored on secondary storage devices such as disks and tapes. We 3 DBMS Notes GIET2013EC4SA must decide what le organizations to use to store the relations, and create auxiliary data structures called indexes to speed up data retrieval operations. •Conceptual – corresponds to the second view of data: What we want the data to express and what relationships between data we must express, what “ story” data tells, are all data necessary for the “story’ are discussed. The conceptual schema (sometimes called the logical schema) describes the stored data in terms of the data model of the DBMS. In a relational DBMS, the conceptual schema describes all relations that are stored in the database. In our sample university database, these relations contain information about entities, such as students and faculty, and about relationships, such as students' enrollment in courses. All student entities can be described using records in a Students relation, as we saw earlier. In fact, each collection of entities and each collection of relationships can be described as a relation, leading to the following conceptual schema: Students(sid: string, name: string, login: string, age: integer, gpa: real) Faculty( d: string, fname: string, sal: real) Courses(cid: string, cname: string, credits: integer) Rooms(rno: integer, address: string, capacity: integer) Enrolled(sid: string, cid: string, grade: string) Teaches( d: string, cid: string) Meets In(cid: string, rno: integer, time: string) The choice of relations, and the choice of elds for each relation, is not always obvious, and the process of arriving at a good conceptual schema is called conceptual database design. •View – corresponds to the third view of data: What part of the data is seen by a specific application. External schemas, which usually are also in terms of the data model of the DBMS, allow data access to be customized (and authorized) at the level of individual users or groups of users. The external schema design is guided by end user requirements. For example, we might ant to allow students to and out the names of faculty members teaching courses, as well as course enrollments. This can be done by de ning the following view: Course info(cid: string, fname: string, enrollment: integer) STRUCTURE OF A DBMS: When a user issues a query, the parsed query is presented to a query optimizer, which uses information about how the data is stored to produce an efficient execution plan for evaluating the query. An execution plan is a blueprint for evaluating a query, and is usually represented as a tree of relational operators. The code that implements relational operators sits on top of the le and access methods layer. This layer includes a variety of software for supporting the concept of a le, which, in a DBMS, is a collection of pages or a collection of records. This layer typically supports a heap le, or le of unordered pages, as well as indexes. In addition to keeping track of the pages in a le, this layer organizes the information within a page. The les and access methods layer code sits on top of the buer manager, which brings pages in from disk to main memory as needed in response to read requests. 4 DBMS Notes GIET2013EC4SA The lowest layer of the DBMS software deals with management of space on disk, where the data is stored. Higher layers allocate, deallocate, read, and write pages through (routines provided by) this layer, called the disk space manager. The DBMS supports concurrency and crash recovery by carefully scheduling user requests and maintaining a log of all changes to the database. DBMS components associated with concurrency control and recovery include the transaction manager, which ensures that transactions request and release locks according to a suitable locking protocol and schedules the execution transactions; the lock manager, which keeps track of requests for locks and grants locks on database objects when they become available; and the recovery manager, which is responsible for maintaining a log, and restoring the system to a consistent state after a crash. The disk space manager, buer manager, and le and access method layers must interact with these components. Data Models: A collection of tools for describing ...... Data. Data relationships. Data semantics. Data constraints. Relational model.......... Entity-Relationship data model (mainly for database design) . Object-based data models (Object-oriented and Object-relational). Semi structured data model (XML). Other older models:......... Network model . Hierarchical model. Database Access from Application Programs: To access the database, DML statements need to be executed from the host language. There are two ways to do this: • By providing an application program interface (set of procedures) that can be used to send DML and DDL statements to the database, and retrieve the results. The Open Database Connectivity (ODBC) standard defined by Microsoft for use with the C language is a commonly used application program interface standard. The Java Database Connectivity (JDBC) standard provides corresponding features to the Java language. • By extending the host language syntax to embed DML calls within the host language program. Usually, a special character prefaces DML calls, and a preprocessor, called the DML precompiled, converts the DML statements to normal procedure calls in the host language. 5 DBMS Notes GIET2013EC4SA Database Users and Administrators: Naive users are unsophisticated users who interact with the system by invoking one of the application programs that have been written previously. Application programmers are computer professionals who write application programs. Sophisticated users interact with the system without writing programs. Instead, they form their requests in a database query language. They submit each such query to a query processor, whose function is to break down DML statements into instructions that the storage manager understands. Analysts who submit queries to explore data in the database fall in this category. Specialized users are sophisticated users who write specialized database applications that do not fit into the traditional data-processing framework. Database Administrator: A person who has such central control over the system is called a database administrator (DBA) Schema definition. The DBA creates the original database schema by executing a set of data definition statements in the DDL. Storage structure and access-method definition. Schema and physical-organization modification. The DBA carries out changes to the schema and physical organization to reflect the changing needs of the organization, or to alter the physical organization to improve performance. Granting of authorization for data access. Routine maintenance. Data Model: A data model is a collection of conceptual tools for describing data, data relationships, data semantics, and consistency constraints. Entity: An entity is a “thing” or “object” in the real world that is distinguishable from all other objects. For example, each person in an enterprise is an entity. Entity set: An entity set is a set of entities of the same type that share the same properties, or attributes. The set of all persons who are customers at a given bank, for example, can be defined as the entity set customer. Similarly, the entity set loan might represent the set of all loans awarded by a particular bank. An entity is represented by a set of attributes. Attributes are descriptive properties possessed by each member of an entity set. The designation of an attribute for an entity set expresses that the database stores similar information concerning each entity in the entity set; however, each entity may have its own value for each attribute. Simple and composite attributes: the attributes have been simple; that is, they are not divided into subparts is called as "simple attributes". on the other hand, can be divided into subparts is called as "composite attributes". For example, an attribute name could be structured as a composite attribute consisting of first-name, middle-initial, and last-name. Single-valued and multivalve attributes: For instance, the loan-number attribute for a specific loan entity refers to only one loan number. Such attributes are said to be single valued. There may be instances where an attribute has a set of values for a specific entity. Consider an employee entity set 6 DBMS Notes GIET2013EC4SA with the attribute phone-number. An employee may have zero, one, or several phone numbers, and different employees may have different numbers of phones. This type of attribute is said to be multivalve. Derived attribute: The value for this type of attribute can be derived from the values of other related attributes or entities. For instance, let us say that the customer entity set has an attribute loans-held, which represents how many loans a customer has from the bank. We can derive the value for this attribute by counting the number of loan entities associated with that customer. Relationship Sets: A relationship is an association among several entities. A relationship set is a set of relationships of the same type. Mapping Cardinalities: Mapping cardinalities, or cardinality ratios, express the number of entities to which another entity can be associated via a relationship set. Mapping cardinalities are most useful in describing binary relationship sets, although they can contribute to the description of relationship sets that involve more than two entity sets. One to one. An entity in A is associated with at most one entity in B, and an entity in B is associated with at most one entity in A. One to many. An entity in A is associated with any number (zero or more) of entities in B. An entity in B, however, can be associated with at most one entity in A. Many to one. An entity in A is associated with at most one entity in B. An entity in B, however, can be associated with any number (zero or more) of entities in A. Many to many. An entity in A is associated with any number (zero or more) of entities in B, and an entity in B is associated with any number (zero or more) of entities in A. Keys: A key allows us to identify a set of attributes that suffice to distinguish entities from each other. Keys also help uniquely identify relationships, and thus distinguish relationships from each other. Super key: A super key is a set of one or more attributes that, taken collectively, allow us to identify uniquely an entity in the entity set. For example, the customer-id attribute of the entity set customer is sufficient to distinguish one customer entity from another. Thus, customer-id is a super key. Similarly, the combination of customer-name and customer-id is a super key for the entity set customer. The customer-name attribute of customer is not a super key, because several people might have the same name. candidate key: minimal super keys are called candidate keys. If K is a super key, then so is any superset of K. We are often interested in super keys for which no proper subset is a super key.It is possible that several distinct sets of attributes could serve as a candidate key. Suppose that a combination of customer-name and customer-street is sufficient to distinguish among members of the customer entity set. Then, both {customer-id} and {customer-name, customer-street} are candidate keys. Although the attributes customer id and customer-name together can distinguish customer entities, their combination does not form a candidate key, since the attribute customer-id alone is a candidate key. primary key:which denotes the unique identity is called as primary key. primary key to denote a candidate key that is chosen by the database designer as the principal means of identifying entities 7 DBMS Notes GIET2013EC4SA within an entity set. A key (primary, candidate, and super) is a property of the entity set, rather than of the individual entities. Any two individual entities in the set are prohibited from having the same value on the key attributes at the same time. The designation of a key represents a constraint in the realworld enterprise being modeled. Weak Entity Sets:An entity set may not have sufficient attributes to form a primary key. Such an entity set is termed a weak entity set. An entity set that has a primary key is termed a strong entity set. For a weak entity set to be meaningful, it must be associated with another entity set, called the identifying or owner entity set. Every weak entity must be associated with an identifying entity; that is, the weak entity set is said to be existence dependent on the identifying entity set. The identifying entity set is said to own the weak entity set that it identifies. The relationship associating the weak entity set with the identifying entity set is called the identifying relationship. The identifying relationship is many to one from the weak entity set to the identifying entity set, and the participation of the weak entity set in the relationship is total. In our example, the identifying entity set for payment is loan, and a relationship loan-payment that associates payment entities with their corresponding loan entities is the identifying relationship. Although a weak entity set does not have a primary key, we nevertheless need a means of distinguishing among all those entities in the weak entity set that depend on one particular strong entity. The discriminator of a weak entity set is a set of attributes that allows this distinction to be made. In E-R diagrams, a doubly outlined box indicates a weak entity set, and a doubly outlined diamond indicates the corresponding identifying relationship. in fig the weak entity set payment depends on the strong entity set loan via the relationship set loan-payment. The figure also illustrates the use of double lines to indicate total participation—the of the (weak) entity set payment in the relationship loan-payment is total, meaning that every payment must be related via loan-payment to some loan. Finally, the arrow from loan-payment to loan indicates that each payment 8 DBMS Notes GIET2013EC4SA is for a single loan. The discriminator of a weak entity set also is underlined, but with a dashed, rather than a solid, line. Specialization: An entity set may include sub groupings of entities that are distinct in some way from other entities in the set. For instance, a subset of entities within an entity set may have attributes that are not shared by all the entities in the entity set. The E-R model provides a means for representing these distinctive entity groupings. Consider an entity set person, with attributes name, street, and city. A person may be further classified as one of the following: • customer • employee Each of these person types is described by a set of attributes that includes all the attributes of entity set person plus possibly additional attributes. For example, customer entities may be described further by the attribute customer-id, whereas employee entities may be described further by the attributes employee-id and salary. The process of designating sub groupings within an entity set is called specialization. The specialization of person allows us to distinguish among persons according to whether they are employees or customers. Generalization: mThe design process may also proceed in a bottom-up manner, in which multiple entity sets are synthesized into a higher-level entity set on the basis of common features. The database designer may have first identified a customer entity set with the attributes name, street, city, and customer-id, and an employee entity set with the attributes name, street, city, employee-id, and salary. There are similarities between the customer entity set and the employee entity set in the sense that they have several attributes in common. This commonality can be expressed by generalization, which is a containment relationship that exists between a higher-level entity set and one or more lowerlevel entity sets. In our example, person is the higher-level entity set and customer and employee are lower-level entity sets. Higher- and lower-level entity sets also may be designated by the terms super class and subclass, respectively. The person entity set is the superclass of the customer and employee subclasses. For all practical purposes, generalization is a simple inversion of specialization. We will apply both processes, in combination, in the course of designing the E-R schema for an enterprise. In terms of the E-R diagram itself, we do not distinguish between specialization and generalization. New levels of entity representation will be distinguished (specialization) or synthesized (generalization) as the design schema comes to express fully the database application and the user requirements of the database. Differences in the two approaches may be characterized by their starting point and overall goal. Generalization proceeds from the recognition that a number of entity sets share some common features (namely, they are described by the same attributes and participate in the same relationship 9 DBMS Notes GIET2013EC4SA sets). Aggregation: Aggregation is an abstraction in which relationship sets (along with their associated entity sets) are treated as higher-level entity sets, and can participate in relationships. 10 DBMS Notes Symbols used in the E-R notation: GIET2013EC4SA 11 DBMS Notes GIET2013EC4SA ER Model For a college DB: Assumptions : A college contains many departments Each department can offer any number of courses Many instructors can work in a department An instructor can work only in one department For each department there is a Head An instructor can be head of only one department Each instructor can take any number of courses A course can be taken by only one instructor A student can enroll for any number of courses Each course can have any number of students Steps in ER Modeling: Identify the Entities Find relationships Identify the key attributes for every Entity Identify other relevant attributes Draw complete E-R diagram with all attributes including Primary Key Step 1: Identify the Entities: DEPARTMENT STUDENT COURSE INSTRUCTOR Step 2: Find the relationships: One course is enrolled by multiple students and one student enrolls for multiple courses, hence the cardinality between course and student is Many to Many. The department offers many courses and each course belongs to only one department, hence the cardinality between department and course is One to Many. One department has multiple instructors and one instructor belongs to one and only one department , hence the cardinality between department and instructor is one to Many. Each department there is a “Head of department” and one instructor is “Head of department “,hence the cardinality is one to one . One course is taught by only one instructor, but the instructor teaches many courses, hence the cardinality between course and instructor is many to one. Step 3: Identify the key attributes Dept name is the key attribute for the Entity “Department”, as it identifies the Department uniquely. 12 DBMS Notes GIET2013EC4SA Course# (CourseId) is the key attribute for “Course” Entity. Student# (Student Number) is the key attribute for “Student” Entity. Instructor Name is the key attribute for “Instructor” Entity. Step 4: Identify other relevant attributes For the department entity, the relevant attribute is location duration, prerequisite ER model for Banking Business : Assumptions : 13 DBMS Notes GIET2013EC4SA There are multiple banks and each bank has many branches. Each branch has multiple customers Customers have various types of accounts Some Customers also had taken different types of loans from these bank branches One customer can have multiple accounts and Loans Step 1: Identify the Entities • BANK • BRANCH • LOAN • ACCOUNT • CUSTOMER Step 2: Find the relationships • One Bank has many branches and each branch belongs to only one bank, hence the cardinality between Bank and Branch is One to Many. • One Branch offers many loans and each loan is associated with one branch, hence the cardinality between Branch and Loan is One to Many. • One Branch maintains multiple accounts and each account is associated to one and only one Branch, hence the cardinality between Branch and Account is One to Many • One Loan can be availed by multiple customers, and each Customer can avail multiple loans, hence the cardinality between Loan and Customer is Many to Many. • One Customer can hold multiple accounts, and each Account can be held by multiple Customers, hence the cardinality between Customer and Account is Many to Many Step 3: Identify the key attributes • Bank Code (Bank Code) is the key attribute for the Entity “Bank”, as it identifies the bank uniquely. • Branch# (Branch Number) is the key attribute for “Branch” Entity. • Customer# (Customer Number) is the key attribute for “Customer” Entity. • Loan# (Loan Number) is the key attribute for “Loan” Entity. • Account No (Account Number) is the key attribute for “Account” Entity. Step 4: Identify other relevant attributes • For the “Bank” Entity, the relevant attributes other than “Bank Code” would be “Name” and “Address”. • For the “Branch” Entity, the relevant attributes other than “Branch#” would be “Name” and “Address”. • For the “Loan” Entity, the relevant attribute other than “Loan#” would be “Loan Type”. • For the “Account” Entity, the relevant attribute other than “Account No” would be “Account Type”. • For the “Customer” Entity, the relevant attributes other than “Customer#” would be “Name”, “Telephone#” and “Address”. E-R diagram with all attributes including Primary Key: 14 DBMS Notes GIET2013EC4SA 15 DBMS Notes GIET2013EC4SA Normalization 1. 2. 3. 4. 5. While designing a database out of an entity–relationship model, the main problem existing in that “raw” database is redundancy. Redundancy is storing the same data item in more one place. A redundancy creates several problems like the following: Extra storage space: storing the same data in many places takes large amount of disk space. Entering same data more than once during data insertion. Deleting data from more than one place during deletion. Modifying data in more than one place. Anomalies may occur in the database if insertion, deletion, modification etc are no done properly. It creates inconsistency and unreliability in the database. To solve this problem, the “raw” database needs to be normalized. This is a step by step process of removing different kinds of redundancy and anomaly at each step. At each step a specific rule is followed to remove specific kind of impurity in order to give the database a slim and clean look. Un-Normalized Form (UNF) If a table contains non-atomic values at each row, it is said to be in UNF. An atomic value is something that can not be further decomposed. A non-atomic value, as the name suggests, can be further decomposed and simplified. Consider the following table: Emp-Id Emp-Name Month Sales Bank-Id Bank-Name E01 AA Jan 1000 B01 SBI Feb 1200 Mar 850 E02 BB Jan 2200 B02 UTI Feb 2500 E03 CC Jan 1700 B01 SBI Feb 1800 Mar 1850 Apr 1725 In the sample table above, there are multiple occurrences of rows under each key Emp-Id. Although considered to be the primary key, Emp-Id cannot give us the unique identification facility for any single row. Further, each primary key points to a variable length record (3 for E01, 2 for E02 and 4 for E03). First Normal Form (1NF) A relation is said to be in 1NF if it contains no non-atomic values and each row can provide a unique combination of values. The above table in UNF can be processed to create the following table in 1NF. Emp-Name Month Sales Bank-Id Bank-Name Emp-Id E01 AA Jan 1000 B01 SBI E01 AA Feb 1200 B01 SBI E01 AA Mar 850 B01 SBI E02 BB Jan 2200 B02 UTI E02 BB Feb 2500 B02 UTI 16 DBMS Notes GIET2013EC4SA E03 CC Jan 1700 B01 SBI E03 CC Feb 1800 B01 SBI E03 CC Mar 1850 B01 SBI E03 CC Apr 1725 B01 SBI As you can see now, each row contains unique combination of values. Unlike in UNF, this relation contains only atomic values, i.e. the rows can not be further decomposed, so the relation is now in 1NF. Second Normal Form (2NF) A relation is said to be in 2NF f if it is already in 1NF and each and every attribute fully depends on the primary key of the relation. Speaking inversely, if a table has some attributes which is not dependant on the primary key of that table, then it is not in 2NF. Let us explain. Emp-Id is the primary key of the above relation. Emp-Name, Month, Sales and BankName all depend upon Emp-Id. But the attribute Bank-Name depends on Bank-Id, which is not the primary key of the table. So the table is in 1NF, but not in 2NF. If this position can be removed into another related relation, it would come to 2NF. Emp-Id Emp-Name Month Sales Bank-Id E01 AA JAN 1000 B01 E01 AA FEB 1200 B01 E01 AA MAR 850 B01 E02 BB JAN 2200 B02 E02 BB FEB 2500 B02 E03 CC JAN 1700 B01 E03 CC FEB 1800 B01 E03 CC MAR 1850 B01 E03 CC APR 1726 B01 Bank-Id Bank-Name B01 SBI B02 UTI After removing the portion into another relation we store lesser amount of data in two relations without any loss information. There is also a significant reduction in redundancy. Third Normal Form (3NF) A relation is said to be in 3NF, if it is already in 2NF and there exists no transitive dependency in that relation. Speaking inversely, if a table contains transitive dependency, then it is not in 3NF, and the table must be split to bring it into 3NF. What is a transitive dependency? Within a relation if we see A → B [B depends on A] And B → C [C depends on B] Then we may derive A → C[C depends on A] 17 DBMS Notes GIET2013EC4SA Such derived dependencies hold well in most of the situations. For example if we have Roll → Marks And Marks → Grade Then we may safely derive Roll → Grade. This third dependency was not originally specified but we have derived it. The derived dependency is called a transitive dependency when such dependency becomes improbable. For example we have been given Roll → City And City → STD Code If we try to derive Roll → STD Code it becomes a transitive dependency, because obviously the STDCode of a city cannot depend on the roll number issued by a school or college. In such a case the relation should be broken into two, each containing one of these two dependencies: Roll → City And City → STD code Boyce-Code Normal Form (BCNF) A relationship is said to be in BCNF if it is already in 3NF and the left hand side of every dependency is a candidate key. A relation which is in 3NF is almost always in BCNF. These could be same situation when a 3NF relation may not be in BCNF the following conditions are found true. 1. The candidate keys are composite. 2. There are more than one candidate keys in the relation. 3. There are some common attributes in the relation. Professor Code Department Head of Dept. Percent Time P1 Physics Ghosh 50 P1 Mathematics Krishnan 50 P2 Chemistry Rao 25 P2 Physics Ghosh 75 P3 Mathematics Krishnan 100 Consider, as an example, the above relation. It is assumed that: 1. A professor can work in more than one department 2. The percentage of the time he spends in each department is given. 3. Each department has only one Head of Department. 18 DBMS Notes The relation GIET2013EC4SA diagram for the above relation is given as the following: The given relation is in 3NF. Observe, however, that the names of Dept. and Head of Dept. are duplicated. Further, if Professor P2 resigns, rows 3 and 4 are deleted. We lose the information that Rao is the Head of Department of Chemistry. The normalization of the relation is done by creating a new relation for Dept. and Head of Dept. and deleting Head of Dept. form the given relation. The normalized relations are shown in the following. Professor Code Department Percent Time P1 Physics 50 P1 Mathematics 50 P2 Chemistry 25 P2 Physics 75 P3 Mathematics 100 Head of Dept. See the dependency Fourth Normal Form (4NF) Department Physics Ghosh Mathematics Krishnan Chemistry Rao diagrams for these new relations. 19 DBMS Notes 1. 2. 3. 4. 1. 2. GIET2013EC4SA When attributes in a relation have multi-valued dependency, further Normalization to 4NF and 5NF are required. Let us first find out what multi-valued dependency is. A multi-valued dependency is a typical kind of dependency in which each and every attribute within a relation depends upon the other, yet none of them is a unique primary key. We will illustrate this with an example. Consider a vendor supplying many items to many projects in an organization. The following are the assumptions: A vendor is capable of supplying many items. A project uses many items. A vendor supplies to many projects. An item may be supplied by many vendors. A multi valued dependency exists here because all the attributes depend upon the other and yet none of them is a primary key having unique value. Vendor Code Item Code Project No. V1 I1 P1 V1 I2 P1 V1 I1 P3 V1 I2 P3 V2 I2 P1 V2 I3 P1 V3 I1 P2 V3 I1 P3 The given relation has a number of problems. For example: If vendor V1 has to supply to project P2, but the item is not yet decided, then a row with a blank for item code has to be introduced. The information about item I1 is stored twice for vendor V3. Observe that the relation given is in 3NF and also in BCNF. It still has the problem mentioned above. The problem is reduced by expressing this relation as two relations in the Fourth Normal Form (4NF). A relation is in 4NF if it has no more than one independent multi valued dependency or one independent multi valued dependency with a functional dependency. The table can be expressed as the two 4NF relations given as following. The fact that vendors are capable of supplying certain items and that they are assigned to supply for some projects in independently specified in the 4NF relation. Vendor-Supply Item Code Vendor Code V1 I1 V1 I2 V2 I2 V2 I3 V3 I1 Vendor-Project Project No. Vendor Code V1 P1 V1 P3 20 DBMS Notes GIET2013EC4SA V2 V3 P1 P2 Fifth Normal Form (5NF) These relations still have a problem. While defining the 4NF we mentioned that all the attributes depend upon each other. While creating the two tables in the 4NF, although we have preserved the dependencies between Vendor Code and Item code in the first table and Vendor Code and Item code in the second table, we have lost the relationship between Item Code and Project No. If there were a primary key then this loss of dependency would not have occurred. In order to revive this relationship we must add a new table like the following. Please note that during the entire process of normalization, this is the only step where a new table is created by joining two attributes, rather than splitting them into separate tables. Project No. Item Code P1 11 P1 12 P2 11 P3 11 P3 13 Let us finally summarize the normalization steps we have discussed so far. Input Transformation Output Relation Relation All Relations Eliminate variable length record. Remove multi-attribute lines in table. 1NF 1NF Remove dependency of non-key attributes on part of a multi-attribute key. 2NF Relation 2NF Remove dependency of non-key attributes on other non-key attributes. 3NF 3NF Remove dependency of an attribute of a multi attribute key on an attribute BCNF of another (overlapping) multi-attribute key. BCNF Remove more than one independent multi-valued dependency from relation 4NF by splitting relation. 4NF Add one relation relating attributes with multi-valued dependency. 5NF Primitive data type primitive data type a basic type is a data type provided by a programming language as a basic building block. Most languages allow more complicated composite types to be recursively constructed starting from basic types. a built-in type is a data type for which the programming language provides built-in support. 21 DBMS Notes GIET2013EC4SA In most programming languages, all basic data types are built-in. In addition, many languages also provide a set of composite data types. Opinions vary as to whether a built-in type that is not basic should be considered "primitive Depending on the language and its implementation, primitive data types may or may not have a oneto-one correspondence with objects in the computer's memory. However, one usually expects operations on basic primitive data types to be the fastest language constructs there are Integer addition, for example, can be performed as a single machine instruction, and some processors offer specific instructions to process sequences of characters with a single instruction. In particular, the C standard mentions that "a 'plain' int object has the natural size suggested by the architecture of the execution environment". This means that int is likely to be 32 bits long on a 32-bit architecture. Basic primitive types are almost always value types. Most languages do not allow the behavior or capabilities of primitive (either built-in or basic) data types to be modified by programs. Exceptions include Smalltalk, which permits all data types to be extended within a program, adding to the operations that can be performed on them or even redefining the builtin operations. Composite types are derived from more than one primitive type. This can be done in a number of ways. The ways they are combined are called data structures. Composing a primitive type into a compound type generally results in a new type, e.g. array-of-integer is a different type to integer. An array stores a number of elements of the same type in a specific order. They are accessed using an integer to specify which element is required (although the elements may be of almost any type). Arrays may be fixed-length or expandable. Record (also called tuple or struct) Records are among the simplest data structures. A record is a value that contains other values, typically in fixed number and sequence and typically indexed by names. The elements of records are usually called fields or members. Union. A union type definition will specify which of a number of permitted primitive types may be stored in its instances, e.g. "float or long integer". Contrast with a record, which could be defined to contain a float and an integer; whereas, in a union, there is only one value at a time. A tagged union (also called a variant, variant record, discriminated union, or disjoint union) contains an additional field indicating its current type, for enhanced type safety. A set is an abstract data structure that can store certain values, without any particular order, and no repeated values. Values themselves are not retrieved from sets, rather one tests a value for membership to obtain a boolean "in" or "not in". An object contains a number of data fields, like a record, and also a number of program code fragments for accessing or modifying them. Data structures not containing code, like those above, are called plain old data structure. 22 DBMS Notes GIET2013EC4SA Many others are possible, but they tend to be further variations and compounds of the above. Logical and Physical Database Requirements The requirements for a logical and physical database vary by size and design parameters. A logical database must be able to access and identify all files within the storage system to operate correctly, whereas a physical database manages a much smaller field of information. Sometimes, a physical database stores only a single file with one value or word in it. Logical Database Definition A logical database is the collected information stored on multiple physical disk files and hard drives within a computer. This database provides a structure to house all the accumulated information within the device and determines the relationships between different types of files and programs. A logical database determines these relationships through a series of highly structured tables designed to categorize information into groups for easier accessibility. Without this categorization, accessing different files within a computer would take additional time as the system searched each file for the appropriate match. Logical Database Requirements A logical database can stretch over multiple physical hard disks and information files. The data storage unit is still a single database for information retrieval purposes. To have a logical database, all given hard disks and information files must be accessible from a single source. An example would be a personal computer able to access its information files stored on multiple hard drives from a single user interface. According to Microsoft, when a logical database is successful, the user sees a coherent list of information from a central location that draws from the many file sources tied into the storage system. Physical Database Definition A physical database is both the actual device housing the information files and the search paths used to access information between each source. According to Microsoft, the term "database" refers only to the logical database controlling information files for the entire system. A physical database is technically a smaller unit of storage referred to as either a company, field, record or table, depending on how much information the physical storage device contains. A field is the smallest unit of storage housing only a single file. A company is the largest -- next to a database -- housing separate, large groups of data. Physical Storage Requirements The requirements for a physical database vary by the parameters of the storage device in question. For example, a flash drive designed to hold up to 2 gigabytes of information needs a personal computer or another USB-connected device to allow access to the information stored on the equipment. A physical database also needs a power source to access information. A computer hard drive cannot function without electricity. A flash drive cannot operate without a device with an adequate power source. 23 DBMS Notes GIET2013EC4SA Two types of data modeling are as follows: Logical modeling Physical modeling If you are going to be working with databases, then it is important to understand the difference between logical and physical modeling, and how they relate to one another. Logical and physical modeling are described in more detail in the following subsections. Logical Modeling Logical modeling deals with gathering business requirements and converting those requirements into a model. The logical model revolves around the needs of the business, not the database, although the needs of the business are used to establish the needs of the database. Logical modeling involves gathering information about business processes, business entities (categories of data), and organizational units. After this information is gathered, diagrams and reports are produced including entity relationship diagrams, business process diagrams, and eventually process flow diagrams. The diagrams produced should show the processes and data that exists, as well as the relationships between business processes and data. Logical modeling should accurately render a visual representation of the activities and data relevant to a particular business. The diagrams and documentation generated during logical modeling is used to determine whether the requirements of the business have been completely gathered. Management, developers, and end users alike review these diagrams and documentation to determine if more work is required before physical modeling commences. Typical deliverables of logical modeling include Entity relationship diagrams An Entity Relationship Diagram is also referred to as an analysis ERD. The point of the initial ERD is to provide the development team with a picture of the different categories of data for the business, as well as how these categories of data are related to one another. Business process diagrams The process model illustrates all the parent and child processes that are performed by individuals within a company. The process model gives the development team an idea of how data moves within the organization. Because process models illustrate the activities of individuals in the company, the process model can be used to determine how a database application interface is design. User feedback documentation Physical Modeling Physical modeling involves the actual design of a database according to the requirements that were established during logical modeling. Logical modeling mainly involves gathering the requirements of the business, with the latter part of logical modeling directed toward the goals and requirements of the 24 DBMS Notes GIET2013EC4SA database. Physical modeling deals with the conversion of the logical, or business model, into a relational database model. When physical modeling occurs, objects are being defined at the schema level. A schema is a group of related objects in a database. A database design effort is normally associated with one schema. During physical modeling, objects such as tables and columns are created based on entities and attributes that were defined during logical modeling. Constraints are also defined, including primary keys, foreign keys, other unique keys, and check constraints. Views can be created from database tables to summarize data or to simply provide the user with another perspective of certain data. Other objects such as indexes and snapshots can also be defined during physical modeling. Physical modeling is when all the pieces come together to complete the process of defining a database for a business. Physical modeling is database software specific, meaning that the objects defined during physical modeling can vary depending on the relational database software being used. For example, most relational database systems have variations with the way data types are represented and the way data is stored, although basic data types are conceptually the same among different implementations. Additionally, some database systems have objects that are not available in other database systems. Typical deliverables of physical modeling include the following: Server model diagrams The server model diagram shows tables, columns, and relationships within a database. User feedback documentation Database design documentation Conclusion Understanding the difference between logical and physical modeling will help you build better organized and more effective database systems. Data independence Data independence is the type of data transparency that matters for a centralized DBMS. It refers to the immunity of user applications to make changes in the definition and organization of data. Physical data independence deals with hiding the details of the storage structure from user applications. The application should not be involved with these issues, since there is no difference in the operation carried out against the data. The data independence and operation independence together gives the feature of data abstraction. There are two levels of data independence. First level 25 DBMS Notes GIET2013EC4SA The logical structure of the data is known as the schema definition. In general, if a user application operates on a subset of the attributes of a relation, it should not be affected later when new attributes are added to the same relation. Logical data independence indicates that the conceptual schema can be changed without affecting the existing schemas. Second level The physical structure of the data is referred to as "physical data description". Physical data independence deals with hiding the details of the storage structure from user applications. The application should not be involved with these issues since, conceptually, there is no difference in the operations carried out against the data. There are two types of data independence: 1. Logical data independence: The ability to change the logical (conceptual) schema without changing the External schema (User View) is called logical data independence. For example, the addition or removal of new entities, attributes, or relationships to the conceptual schema should be possible without having to change existing external schemas or having to rewrite existing application programs. 2. Physical data independence: The ability to change the physical schema without changing the logical schema is called physical data independence. For example, a change to the internal schema, such as using different file organization or storage structures, storage devices, or indexing strategy, should be possible without having to change the conceptual or external schemas. 3. View level data independence: always independent no effect, because there doesn't exist any other level above view level. Data Independence Types Data independence has two types. They are: 1. Physical Independence 2. Logical Independence. Data independence can be explained as follows: Each higher level of the data architecture is immune to changes of the next lower level of the architecture. Physical Independence: The logical scheme stays unchanged even though the storage space or type of some data is changed for reasons of optimization or reorganization. In this external schema does not change. In this internal schema changes may be required due to some physical schema were reorganized here. Physical data independence is present in most databases and file environment in 26 DBMS Notes GIET2013EC4SA which hardware storage of encoding, exact location of data on disk, merging of records, so on this are hidden from user. Logical Independence: The external scheme may stay unchanged for most changes of the logical scheme. This is especially desirable as the application software does not need to be modified or newly translated. Data abstraction In computer science, abstraction is the process by which data and programs are defined with a representation similar in form to its meaning (semantics), while hiding away the implementation details. Abstraction tries to reduce and factor out details so that the programmer can focus on a few concepts at a time. A system can have several abstraction layers whereby different meanings and amounts of detail are exposed to the programmer. For example, low-level abstraction layers expose details of the computer hardware where the program is run, while high-level layers deal with the business logic of the program. The following English definition of abstraction helps to understand how this term applies to computer science, IT and objects: abstraction - a concept or idea not associated with any specific instance Abstraction captures only those details about an object that are relevant to the current perspective. The concept originated by analogy with abstraction in mathematics. The mathematical technique of abstraction begins with mathematical definitions, making it a more technical approach than the general concept of abstraction in philosophy. For example, in both computing and in mathematics, numbers are concepts in the programming languages, as founded in mathematics. Implementation details depend on the hardware and software, but this is not a restriction because the computing concept of number is still based on the mathematical concept. In computer programming, abstraction can apply to control or to data: Control abstraction is the abstraction of actions while data abstraction is that of data structures. Control abstraction involves the use of subprograms and related concepts control flows Data abstraction allows handling data bits in meaningful ways. For example, it is the basic motivation behind datatype. One can regard the notion of an object (from object-oriented programming) as an attempt to combine abstractions of data and code. The same abstract definition can be used as a common interface for a family of objects with different implementations and behaviors but which share the same meaning. The inheritance mechanism in object-oriented programming can be used to define an abstract class as the common interface. 27 DBMS Notes GIET2013EC4SA The recommendation that programmers use abstractions whenever suitable in order to avoid duplication (usually of code) is known as the abstraction principle. The requirement that a programming language provide suitable abstractions is also called the abstraction principle. Data abstraction enforces a clear separation between the abstract properties of a data type and the concrete details of its implementation. The abstract properties are those that are visible to client code that makes use of the data type—the interface to the data type—while the concrete implementation is kept entirely private, and indeed can change, for example to incorporate efficiency improvements over time. The idea is that such changes are not supposed to have any impact on client code, since they involve no difference in the abstract behaviour. For example, one could define an abstract data type called lookup table which uniquely associates keys with values, and in which values may be retrieved by specifying their corresponding keys. Such a lookup table may be implemented in various ways: as a hash table, a binary search tree, or even a simple linear list of (key:value) pairs. As far as client code is concerned, the abstract properties of the type are the same in each case. Of course, this all relies on getting the details of the interface right in the first place, since any changes there can have major impacts on client code. As one way to look at this: the interface forms a contract on agreed behaviour between the data type and client code; anything not spelled out in the contract is subject to change without notice. Languages that implement data abstraction include Ada and Modula-2. Object-oriented languages are commonly claimed[to offer data abstraction; however, their inheritance concept tends to put information in the interface that more properly belongs in the implementation; thus, changes to such information ends up impacting client code, leading directly to the Fragile binary interface problem. SQL SQL is a standard language for accessing databases. Our SQL tutorial will teach you how to use SQL to access and manipulate data in: MySQL, SQL Server, Access, Oracle, Sybase, DB2, and other database systems. SQL is a standard language for accessing and manipulating databases. What is SQL? SQL stands for Structured Query Language SQL lets you access and manipulate databases SQL is an ANSI (American National Standards Institute) standard What Can SQL do? 28 DBMS Notes GIET2013EC4SA SQL can execute queries against a database SQL can retrieve data from a database SQL can insert records in a database SQL can update records in a database SQL can delete records from a database SQL can create new databases SQL can create new tables in a database SQL can create stored procedures in a database SQL can create views in a database SQL can set permissions on tables, procedures, and views SQL is a Standard - BUT.... Although SQL is an ANSI (American National Standards Institute) standard, there are different versions of the SQL language. However, to be compliant with the ANSI standard, they all support at least the major commands (such as SELECT, UPDATE, DELETE, INSERT, WHERE) in a similar manner. Note: Most of the SQL database programs also have their own proprietary extensions in addition to the SQL standard! Using SQL in Your Web Site To build a web site that shows data from a database, you will need: An RDBMS database program (i.e. MS Access, SQL Server, MySQL) To use a server-side scripting language, like PHP or ASP To use SQL to get the data you want To use HTML / CSS RDBMS RDBMS stands for Relational Database Management System. RDBMS is the basis for SQL, and for all modern database systems such as MS SQL Server, IBM DB2, Oracle, MySQL, and Microsoft Access. The data in RDBMS is stored in database objects called tables. A table is a collection of related data entries and it consists of columns and rows. 29 DBMS Notes GIET2013EC4SA Database Tables A database most often contains one or more tables. Each table is identified by a name (e.g. "Customers" or "Orders"). Tables contain records (rows) with data. In this tutorial we will use the well-known North wind sample database (included in MS Access and MS SQL Server). Below is a selection from the "Customers" table: CustomerID CustomerName ContactName Address City PostalCode Country 1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germany 2 Ana Trujillo Emparedados y helados Ana Trujillo Avda. de la Constitución 2222 México D.F. 05021 Mexico 3 Antonio Moreno Taquería Antonio Moreno Mataderos 2312 México D.F. 05023 Mexico 4 Around the Horn Thomas Hardy 120 Hanover Sq. 5 Berglunds snabbköp Christina Berglund Berguvsvägen 8 Luleå London WA1 1DP S-958 22 UK Sweden The table above contains five records (one for each customer) and seven columns (CustomerID, CustomerName, ContactName, Address, City, PostalCode, and Country). SQL Statements Most of the actions you need to perform on a database are done with SQL statements. The following SQL statement selects all the records in the "Customers" table: Example SELECT * FROM Customers; In this tutorial we will teach you all about the different SQL statements. Keep in Mind That... 30 DBMS Notes GIET2013EC4SA SQL is NOT case sensitive: SELECT is the same as select Semicolon after SQL Statements? Some database systems require a semicolon at the end of each SQL statement. Semicolon is the standard way to separate each SQL statement in database systems that allow more than one SQL statement to be executed in the same call to the server. In this tutorial, we will use semicolon at the end of each SQL statement. Some of The Most Important SQL Commands SELECT - extracts data from a database UPDATE - updates data in a database DELETE - deletes data from a database INSERT INTO - inserts new data into a database CREATE DATABASE - creates a new database ALTER DATABASE - modifies a database CREATE TABLE - creates a new table ALTER TABLE - modifies a table DROP TABLE - deletes a table CREATE INDEX - creates an index (search key) DROP INDEX - deletes an index File organization File organization is the methodology which is applied to structured computer files. Files contain computer records which can be documents or information which is stored in a certain way for later retrieval. File organization refers primarily to the logical arrangement of data (which can itself be organized in a system of records with correlation between the fields/columns) in a file system. It should not be confused with the physical storage of the file in some types of storage media. There are certain basic types of computer file, which can include files stored as blocks of data and streams of data, where the information streams out of the file while it is being read until the end of the file is encountered. We will look at two components of file organization here: 1. The way the internal file structure is arranged and 2. The external file as it is presented to the O/S or program that calls it. Here we will also examine the concept of file extensions. We will examine various ways that files can be stored and organized. Files are presented to the application as a stream of bytes and then an EOF (end of file) condition. 31 DBMS Notes GIET2013EC4SA A program that uses a file needs to know the structure of the file and needs to interpret its contents. Internal File Structure - Methods and Design Paradigm It is a high-level design decision to specify a system of file organization for a computer software program or a computer system designed for a particular purpose. Performance is high on the list of priorities for this design process, depending on how the file is being used. The design of the file organization usually depends mainly on the system environment. For instance, factors such as whether the file is going to be used for transaction-oriented processes like OLTP or Data Warehousing, or whether the file is shared among various processes like those found in a typical distributed system or standalone. It must also be asked whether the file is on a network and used by a number of users and whether it may be accessed internally or remotely and how often it is accessed. However, all things considered the most important considerations might be: 1. 2. 3. 4. Rapid access to a record or a number of records which are related to each other. The Adding, modification, or deletion of records. Efficiency of storage and retrieval of records. Redundancy, being the method of ensuring data integrity. A file should be organized in such a way that the records are always available for processing with no delay. This should be done in line with the activity and volatility of the information. Types of File Organization Organizing a file depends on what kind of file it happens to be: a file in the simplest form can be a text file, (in other words a file which is composed of ascii (American Standard Code for Information Interchange) text.) Files can also be created as binary or executable types (containing elements other than plain text.) Also, files are keyed with attributes which help determine their use by the host operating system. Techniques of File Organization The three techniques of file organization are: 1. 2. 1. 2. 3. 3. Heap (unordered) Sorted Sequential (SAM) Line Sequential (LSAM) Indexed Sequential (ISAM) Hashed or Direct In addition to the three techniques, there are four methods of organizing files. They are sequential, line-sequential, indexed-sequential, inverted list and direct or hashed access organization. 32 DBMS Notes GIET2013EC4SA Sequential Organization A sequential file contains records organized in the order they were entered. The order of the records is fixed. The records are stored and sorted in physical, contiguous blocks within each block the records are in sequence. Records in these files can only be read or written sequentially. Once stored in the file, the record cannot be made shorter, or longer, or deleted. However, the record can be updated if the length does not change. (This is done by replacing the records by creating a new file.) New records will always appear at the end of the file. If the order of the records in a file is not important, sequential organization will suffice, no matter how many records you may have. Sequential output is also useful for report printing or sequential reads which some programs prefer to do. Line-Sequential Organization Line-sequential files are like sequential files, except that the records can contain only characters as data. Line-sequential files are maintained by the native byte stream files of the operating system. In the COBOL environment, line-sequential files that are created with WRITE statements with the ADVANCING phrase can be directed to a printer as well as to a disk. Indexed-Sequential Organization Key searches are improved by this system too. The single-level indexing structure is the simplest one where a file, whose records are pairs, contains a key pointer. This pointer is the position in the data file of the record with the given key. A subset of the records, which are evenly spaced along the data file, is indexed, in order to mark intervals of data records. This is how a key search is performed: the search key is compared with the index keys to find the highest index key coming in front of the search key, while a linear search is performed from the record that the index key points to, until the search key is matched or until the record pointed to by the next index entry is reached. Regardless of double file access (index + data) required by this sort of search, the access time reduction is significant compared with sequential file searches. Let's examine, for sake of example, a simple linear search on a 1,000 record sequentially organized file. An average of 500 key comparisons are needed (and this assumes the search keys are uniformly distributed among the data keys). However, using an index evenly spaced with 100 entries, the total number of comparisons is reduced to 50 in the index file plus 50 in the data file: a five to one reduction in the operations count! Hierarchical extension of this scheme is possible since an index is a sequential file in itself, capable of indexing in turn by another second-level index, and so forth and so on. And the exploit of the hierarchical decomposition of the searches more and more, to decrease the access time will pay increasing dividends in the reduction of processing time. There is however a point when this advantage 33 DBMS Notes GIET2013EC4SA starts to be reduced by the increased cost of storage and this in turn will increase the index access time. Hardware for Index-Sequential Organization is usually Disk-based, rather than tape. Records are physically ordered by primary key. And the index gives the physical location of each record. Records can be accessed sequentially or directly, via the index. The index is stored in a file and read into memory at the point when the file is opened. Also, indexes must be maintained. Life sequential organization the data is stored in physical contiguous box. How ever the difference is in the use of indexes. There are three areas in the disc storage: Primary Area:-Contains file records stored by key or ID numbers. Overflow Area:-Contains records area that cannot be placed in primary area. Index Area:-It contains keys of records and there locations on the disc. Inverted List In file organization, this is a file that is indexed on many of the attributes of the data itself. The inverted list method has a single index for each key type. The records are not necessarily stored in a sequence. They are placed in the are data storage area, but indexes are updated for the record keys and location. Here's an example, in a company file, an index could be maintained for all products, another one might be maintained for product types. Thus, it is faster to search the indexes than every record. These types of file are also known as "inverted indexes." Nevertheless, inverted list files use more media space and the storage devices get full quickly with this type of organization. The benefits are apparent immediately because searching is fast. However, updating is much slower. Content-based queries in text retrieval systems use inverted indexes as their preferred mechanism. Data items in these systems are usually stored compressed which would normally slow the retrieval process, but the compression algorithm will be chosen to support this technique. When querying a file there are certain circumstances when the query is designed to be modal which means that rules are set which require that different information be held in the index. Here's an example of this modality: when phrase querying is undertaken, the particular algorithm requires that offsets to word classifications are held in addition to document numbers. Direct or Hashed Access With direct or hashed access a portion of disk space is reserved and a “hashing” algorithm computes the record address. So there is additional space required for this kind of file in the store. Records are placed randomly through out the file. Records are accessed by addresses that specify their disc location. Also, this type of file organization requires a disk storage rather than tape. It has an excellent search retrieval performance, but care must be taken to maintain the indexes. If the indexes become corrupt, what is left may as well go to the bit-bucket, so it is as well to have regular backups of this kind of file just as it is for all stored valuable data! External File Structure and File Extensions 34 DBMS Notes GIET2013EC4SA Microsoft Windows and MS-DOS File Systems The external structure of a file depends on whether it is being created on a FAT or NTFS partition. The maximum filename length on a NTFS partition is 256 characters, and 11 characters on FAT (8 character name+"."+3 character extension.) NTFS filenames keep their case, whereas FAT filenames have no concept of case (but case is ignored when performing a search under NTFS Operating System). Also, there is the new VFAT which permits 256 character filenames. UNIX and Apple Macintosh File Systems The concept of directories and files is fundamental to the UNIX operating system. On Microsoft Windows-based operating systems, directories are depicted as folders and moving about is accomplished by clicking on the different icons. In UNIX, the directories are arranged as a hierarchy with the root directory being at the top of the tree. The root directory is always depicted as /. Within the / directory, there are subdirectories (e.g.: etc and sys). Files can be written to any directory depending on the permissions. Files can be readable, writable and/or executable. Organizing files using Libraries With the advent of Microsoft Windows 7 the concept of file organization and management has improved drastically by way of use of powerful tool called Libraries. A Library is file organization system to bring together related files and folders stored in different locations of the local as well as network computer such that these can be accessed centrally through a single access point. For instance, various images stored in different folders in the local computer or/and across a computer network can be accumulated in an Image Library. Aggregation of similar files can be manipulated, sorted or accessed conveniently as and when required through a single access point on a computer desktop by use of a Library. This feature is particularly very useful for accessing similar content of related content, and also, for managing projects using related and common data.