cnapter2 lcl exprect rnclancy? , h o r vn i n r: tor the DatabaseSystem Concepts and Architecture I hold on : io na lfi l e rJn gesto 1rL'ntand rntifi, tl-re f N, ar.rd ' ., 'l 't.l ? i n clude tratures " i t su e of .rnri r-r-rity '':al$**ffir tr he architecture of DBMS packageshas evolyed ffi from the early rnonolithii systems,where the ivhole DBMS software packagewas one tightly integrated system,to the modern DBMS packagesthat are rnodular in design,with a client/serversystemarchitecture. This evolution mirrors the trends in computing, where large centralizedmainframe computersarebeing replacedby hundredsof distributedworkstationsand personal computers connected via contntunications networks to various types of server nrachines-Web servers,databaseservers,file servers,application servers,and so on. In a basic client/server DBMS architecture,the system functionality is distributed benveentwo types of modules.r A client module is typically designedso that it rvill run on a user workstation or personalcomputer.Typically,applicationprograms and user interfacesthat accessthe datirbaserun in the client module. Hence,the client module handles user interaction and provides the user-friendlv interfaces such as forms- or menu-basedGUIs. The other kind of module, called a server module, typically handles data storage,access,search,and other functions. We discussclient/serverarchitecturesin more detail in Section2.5. First, we rnust study more basic concepts that will give us a better understanding of moderr-rdatabase architectures. In this chapter we presentthe terminology and basic conceptsthartrvill be used throughout the book. Section2.1 discusses data models and definesthe conceptsof 1 . As we sh a llse e in Se ctio n2 ,5,there are vari ati onson thi s s mpl e l w o-l rerclent/serverarchtecture, 29 Chapter 2 DatabaseSystem Concepts and Architecture schemasand instirnces,which are fundanrentalto the study of databirses)/stems. Therr,we discussthe three-schemaDBlvlS architecturear.rdclataindependencein or.rwhat a DBMS is supposedto do. Lr Section2.2;this providesa user'sperspectir.e Section2.3 we describethe typesof interfacesand languagesthat are typically prothe databasesystemsoftwareenvironrnent. vided by a DBMS. Section2.4 discusses 2.5 givesan overviewof varioustypesof client/serverarchitectures. Finally, Sectior-r Sect ion2. 7 a cl assi fi cati onof the t,vpesof D B MS packages. Se c ti o n2.6 preser.rts sunrmirrizesthe chapter. in Sections2.4 through 2.6 providesmore detailedconceptsthat rnav The r-naterial to the basicintroductory material. be consideredas sr-rpplementary 2.1 Data Models,Schemas,and Instances characteristicof tl.redirtabirseapproachis tl-ratit providessome One fundan.rer.rtal level of data abstrirction.Data abstraction generallyrefersto the suppressionof detailsof data organizationand storageand the highlighting of the essentialfeaof the turesfor an improved understandingof clata.One of the main characteristics databaseapproachis to support data abstractionso that different usersmay perlevelof detail.A data model-a collectionof concepts ceivedata at their pref-erred that can be used to describethe structure of a database-provides the necessary o.fa databascw e m ean t he dat a n re a n st o achi evethi s abstracti on.lB v strrtctrtrt: and constraintsthat should hold tbr the data.Most data ntodtypes,relatior.rships, elsalsoincludea setof basicoperations fbr specifing retrievirlsand updateson the database. In addition to the basicoperationsprovided by the data model, it is becomingmore comnon to include conceptsin the data model to speciff the dynamic aspect or appiication.This allorvsthe databasedesignerto specifra set behavior of a databirse ofvalid user-definedoperationsthat are allorvedon the databaseobjects,rAn exanrrvl-richcan be appliedto a ple of a user-det'ined operationcould be COMPUTE-GPA, STUDENTobject.On the other hand, genericoperationsto insert,delete,modiSr,or retrieveany kind of object are often included in the Dnsicdnto nrodeloperatiorts. Conceptsto specifubehavior are fundamentalto object-orienteddata models (see Chapters20 and 21) but are alsobeing incorporatedin more traditional data rnodmodels (seeChapter 22) extend the basic relaels. For example,ol'rject-relational tionirl model to incltrdesuch concepts,amons others.In the relationaldata model, th e re i s a provi si on to attachbehavi or to the rel ati onsi n the form of per sist ent storedmodules,poprularlyknown as storedprocedures(seeChapter9). 2 , So r e t' Te) tre rrota t", del 's u>ed to oenote d speti ',c databasedescrol or, o- schena for e' arrol e. th e m a r keti ngdata model .We w l l not use thi s ni erpretatron, rel recl 5aTrenow rereD voaLaD ased e> rgrards o' Il a' e 3 .- l- e r c,.tsi oro{-orccptsl ooesrrbebera'ro o e sr g l d ctr" l e. are r'creds'rgl , be -g co'nb,.ed , )l o a S rngl eactrvrtll.adr'ol a 7 sl ecr' y rrg oel ^a/ro' s a sso ca te d w i th softw aredes qn, 2.1 Data Models,Schemas.and Instances \\'5ternS. : . icncei n i : t r r lo.In -.rlh prrolr tllllle D t , .. I rrn al l y, . l io n 2 .7 ::r.lt ntay .:J. sOllle .. . iort of :t : i. r lfea:-. o t -th e : : r. rVp er!( ) l l ce P t s r cccs sary : hc cl ata .tt.lnrod: a\( )ll th e . : r gr t l o re rspect or ..;iti a set \lt L'\rtnt- :.Ii.'ti to a :trtlih',or '( ' r (l Il ()/ ls . ' Jcls (se e .ri.rnrod. i. ic r el a .r nrociel, ' a r \ lS te nt 2.1.1 Categoriesof Data Models \lany data models havebeen proposed,which rve can categorizeaccordingto the t\.pesof conceptsthey use to describethe databasestructure. HighJevel or conceptual data models provide conceptsthat are closeto the way many usersperceive data,whereaslow-level or physical data models provide conceptsthat describethe detailsof how data is storedin the computer.Conceptsprovided by low-leveldata nrodelsare generally meant for computer specialists,not for typical end users. Betweenthese two extremesis a classof representational (or implementation) data models,awhich provide conceptsthat may be understood by end usersbut that .lre not too far removed from the way data is organized within the computer. Representational data models hide sorre detailsof data storagebut can be implernentedon a computer systemdirectly. (-onceptualdata models use conceptssuch as entities,attributes,and relationships. .\n entity representsa real-worldobject or concept,such as an employeeor a projAn attribute represents r'ctthat is describedin the database. someproperty of intercst that further describesan entity, such as the employee'sname or salary.A re l a ti o n s h i pa mo n g tw o or more euti ti esrepresents un l ssoci ati ouamong tw o or nrore entities,for example,a works-on relationship between an employeeand a project.Chapter 3 presentsthe Entity-Relationshipmodel-a popular high-level .onceptual data model. Chapter 4 describesadditional abstractions used for .rdvancedmodeling,such as generalization, and categories. specialization, Representationalor implementation data models are the models Llsedmost frequently in traditional commercial DBMSs. Theseinclude the widely used relational clatamodel, aswell as the so-calledlegacydata models-the network and hierarchical models-that havebeenwidely usedin the past.Part 2 is devotedto the relational data model, its operationsand languages, and some of the techniquesfor programnring relational databaseapplications.5The SQL standard for relational databasesis describedin Chapters8 and 9. Representational data modelsrepresentdataby using record structuresand henceare sometimescalledrecord-baseddata models. \\'e can regard the object data model group (ODMG) as a new family of higherlevel implementation data models that are closerto conceptualdata models.We describethe generalcharacteristics of object databasesand the ODMG proposed standardin Chapters20 and 21. Object dartamodels are also frequentlyutilized as high-levelconceptualmodels,particularlyin the softwareengineeringdomain. Physicaldata models describehow data is stored as files in the cornputer by representinginformation such as record formats,record orderings,and accesspaths.An accesspath is a structure that makesthe searchfor particular databaserecordsefficient.We discussphysicalstoragetechniquesand accessstrlrcturesin Chapters13 -:.:--De, J. The term implementatrondata model is not a standard termi we have ntroduced t to refer to ihe availa cie d a ta m o d e lsin co m m e r cia databasesystems, ::. l' S 5 , A su m m a r yo f th e n e two r k a nd hi erarchi cadata model s i s i ncl udedi n Append ces E and F. They are a cce ssib lefr o m th e b o o k' sWe b srte, 31 Chapter2 Database SystemConceptsand Architecture and 14. An index is an example of an accesspath that allows direct accessto data using an index term or a key.word.It is similar to the index at the end of this book, exceptthat it may be organizedin a linear,hierarchical,or some other fashion. 2.1.2 Schemas,Instances,and DatabaseState In any data model, it is important to distinguishbetweenthe descriptionof the database and the databaseitself. The description of a databaseis called the database schema, which is specified during databasedesign and is not expectedto change frequently.6Most data models have certain conventions for displaying schemasas diagrams.TA displayedschema is called a schema diagram. Figure 2.1 shows a schemadiagram for the databaseshown in Figure 1.2; the diagram displaysthe structureof each record type but not the actual instancesof records.We call each object in the schema-such as STUDENTor COURSE-a schemaconstruct. A schemadiagram displaysonly sorre aspectsof a scherna,such as the names of record types and data items,and some types of constraints.Other aspectsare not specifiedin the schemadiagram;fbr example,Figure2.1 showsneitherthe datatype of eachdata itern nor the relationshipsamong the vnrious files.Many typesof constraints are not representedin schema diagrams.A constraint such as students majoringin cornputerscience must toke C51310belbrethe end of theirsoplrcmore )tear is quite difficult to represent. The actual data in a databasemay changequite frequently. For example,the databaseshown in Figure 1.2changeseverytime we add a studentor enter a new grade. The data in the databaseat a particularmornent in time is calleda databasestateor snapshot. It is also called the atrrent set of occurrences or instances in the database.In a given databasestate,each schernaconstruct has its own current set of instances;for example,the STUDENTconstruct will contain the set of individr"ral student entities(records)as its instances.Many databasestatescan be constructed to correspond to a particular databaseschema.Every time we insert or delete a record or change the value of a data item in a record, we change one state of the databaseinto anotherstate. The distinction between datarbase schemaand databasestate is very important. When we define a new database,we specif, its databaseschemaonly to the DBMS. At this point, the corresponding databasestate is the entpty sfcte with no data. We get the initial stateof the databasewhen the databaseis first populated or loaded with the initial data.From then on, everytime an updateoperatior.r is appliedto the database,we get another databasestate.At any point in time, the databasehas a currentstnte.EThe DBMS is partly responsiblefor ensuring that every stateof the 6 Srhama nhonnoc d; r o L r ra , ,ta l t., J rr u noodan e< r n a,u r eunyu r ,, i r oLm o n r r n{ rho n o r a h r co r r .r i n -a ",u , a:ngnpl,,\ ) L- r, ln^^ a , q s, N ,\rl ^vvc ^ " d a ta b a sesyste m sin clu d eo perati onsfor al l ow i ngschemachanges,a though the schemachangeprocess i. m o r e r lo lve d ll' a n s n p le daraba.eupdal es. 7 lr r s cr sr o r a ' y r T o a T a o asepdr arce ro L<e sLhe/r?sas t'e pl u'a 'o' s"hpma,e\e th e o r o o e r o u r a lfo r m ,T h e word schemei s someti mesused to refer to a schema. B. T h e cu r r e n tsta ie is a so c a ed the currentsnapsholof the database. tl 'oJgl - .' hcmafdi \ Architecture 2.2 Three-Schema and Datalndeoendence to i 11il i. bo ok, ) 11. r c d Jtaa t a base . ir . rnge r': ll. l5 i l S .:) ( )\ \ 'S i l .. r ',r t he .r ,lcach databaseis a valid state-that is, a state that satisfiesthe structure and constraints specified in the schema. Hence, specifying a correct schema to the DBMS is c'rtremely important and the schema must be designedwith utmost care. The DBMS storesthe descriptionsof the schemaconstructsand constraints-also called the meta-data-in the DBMS catalog so that DBMS software can refer to the schemawhenever it needsto. The schemais sometimescalled the intension, and a databasestateis called an extension of the schema. ,\lthough, as mentioned earlier,the schemais not supposedto changefrequently,it is r.rotuncommon that changesoccasionallyneed to be applied to the schemaas the .rpplicationrequirementschange.For example,we may decide that another data to item needsto be stored for each record in a file, such as adding the Date_of_birth the STUDENTschemain Figure2.1.This is known as schemaevolution.Most modern DBMSs include some operations for schema evolution that can be applied rlhile the databaseis operational. .i:l l cs o f . i:c I t ot . ll.l t|Pe \) [ co l l :; rr/ r't t f -s . ' fi . l 'c{l r- :c t l a t a- \ qracle. state()r Architecture 2.2 Three-Schema and Data Independence Three of the four important characteristicsof the databaseapproach, listed in Section1.3,are (1) insulation of programs and data (program-dataand programoperationindependence),(2) support of multiple userviews,and (3) useof a catalog to store the databasedescription (schema). In this section we specify an architecture for databasesystems,called the three-schema architecture,q that was :c t l i t t a: : -i r 'l O f i ir it iual ' rru ct ed -: clct ea -' oi the Figure 2.1 f or t he S chema diagr am S T U D EN T Name Studentnumber Class Maior datehac p C OU R SE Course name Course number Credit hours Department '( ) r t i l I l t . I )B\ lS. .rt.r.\Ve loaded . l to t he 'c h;:s it : oi tl.re :..e r -. SS :.il P R E R E OU IS ITE number CoursenumberI Prerequisite SECTION Section identifier Coursenumber Semester Year Instructor GR AD ER E POR T Student_number I Section_identifier I Grade S rt (Tschrtzis afterthe committee that proposed architecture, 9 Ths is alsoknownas the ANSI/SPARC andKluo1978) , rn Fi n rrrp 1 9 Chapter2 DatabaseSystemConceptsand Architecture proposed to help achieveand visualizethesecharacteristics. Then we discussthe conceptof data independencefurther. 2.2.1 The Three-Schema Architecture The goal of the three-schema architecture,illustratedin Figure2.2,is to separatethe user applicationsand the physicaldatabase.In this architecture,schemascan be defined at the follorving three levels: The internal level has an internal schema,which describesthe physicalstoragestructureof the database. The internalschemausesa physicaldatarnodel and describesthe completedetailsof data storageand accesspaths for the database. The conceptual level has a conceptual schema, which describesthe structure of the whoie databasefor a comrnunity of users.The conceptualschema hides the detailsof physicalstoragestructuresand concentrateson describi n g e n ti ti e s,di .rtatypes, rel ati onshi ps,user operati ons,and constraint s. Usually,a representationaldata model is used to describethe conceptual schemawhen a databasesystemis implernented.This intplenrentatiotr corrceptualschenrnis often based on a conceptualsclrcmodesignin a high-level data n-rodel. : The external or view level includes a number of external schemas or user views. Eachexternalschemadescribesthe part of the databasethat a particular user group is interestedin and hides the rest of the databasefrom that Figure2,2 The three-schema architecture. End Users * ExternalLevel External/Conceptu al M apping \ Conceptual Level Conceptual/Internal Mapping Internal Level InternalSchema Stored Database 2.2 Three-Schema Architecture and DataIndependence , .ussthe .tr.ltcthe . ..ln be .. r i st or:.1lnr)del . t ir r the la \ t l'L lC - . ch ema .: cr cri b . i r. lints. t . r . pt ual :. rl l 1 1 ) r l - sh -l e vel r )f u sef . ,- r ['(ll- - c ; ^ rtl- \) l t l t h at user group. As in the previous case,each external schemais typically implemented using a representational dertarnodel,possiblybasedon an external schemadesignin a high-ler.eldata model. The three-schemaarchitectureis a convenienttool with which the usercan visualize the schemalevelsin a databasesysterr.Most DBMSs do not separatethe threelevels completelyand explicitly,but support the three-schemaarchitectureto some extent. Some DBMSs may include physical-leveldetails in the conceptual schema.The three-levelANSI architecturehas an ir-nportantplace in clatabasetechnology development becauseit clearlyseparates the users'externallevel,the system'sconceptual level,and the internal storagelevelfor designinga database. It is very much applicable in the designof DBMSs, even today.In most DBMSs that support user views, e x te rn a l s c h e ma s a re s p eci fi ed i n the same data model that descri besthe conceptual-level information (for example,a relationalDBMS like OracleusesSQL tbr this). SomeDBMSs allow differentdata rnodelsto be usedat the conceptualand external levels.An example is Universal Data Base (UDB), a DBMS from IBM, rvhichusesthe relationalmodel to describethe conceptualschema,but may use an object-orientedmodel to describean externalschema. Notice that the three schemasare only descriptionsof data; the stored data that octuallyexistsis at the physicallevel.In a DBMS basedon the three-schemaarchitecture, each user group refers only to its orvn external schema.Hence, the DBMS must transform a requestspecifiedon an external schemainto a requestagainstthe conceptualschema,and then into a requeston the internal schemafor processing over the stored database.If the requestis a databaseretrieval,the data extracted from the stored databasemust be reformatted to match the user'sexternalview. The processesof transforming requestsand resultsbetween levelsare called mappings. Thesemappings may be time-consuming,so some DBMSs-especially those that are meant to support small databases-do not support external views.Even in such systems,however,a certain arnount of mapping is necessaryto transform requests betweenthe conceptualand internal levels. 2.2.2 Data Independence The three-schemaarchitecture can be used to further explain the cor-rceptof data independence, which can be defined as the capacity to change the schema at one levelof a databasesystemwithout having to changethe schemaat the next higher level.We can define two types of data independence: r Logical data independence is the capacityto charrgethe conceptual schema without having to changeexterntrlschemasor application programs. We may changethe conceptualschemato expand the database(by adding a record type or data item), to changeconstraints,or to reducethe database (by removing a record type or data item). In the last case,externalschemas that referonly to the remainingdatashould not be affected.For example,the external schema of Figure 1.5(a) should not be affectedby changing the GR AD E_ R EP ORfil T e (or record type) show n i n Fi gure 1.2 i nto the one 36 Chapter 2 DatabaseSystem Concepts and Architecrure shown in Figure L6(a). onlv the viervdefinition and the mappingsneed be changedin a DBMS that supportslogicaldata independence. After the conceptual schemaundergoesa logical reorganization,applicatiou progriuus thertreferencethe externalschemaconstructsmust work as betore.Chanses to constraintscan be appliedto the conceptualschemawithout affectingthe externalschemasor irpplicationprograms. l.,',Physical data independence is the capacity to change the internal schena w i th o u t h a v i n g to change the conceptualschema.H ence, the external schentasneednot be changedaswell.Changesto the internalschen.ra ntaybe neededbecausesome physicirlfiles were reorganized-[or example,by creating additionalaccessstructures-to improve the performanceof retrievalor update.If the same data as before remains in the database,tve should not have to changethe conceptualscherna.For example,providing an access path to improve retrievalspeedof sectionrecords(Figure 1.2) by semester and yearshould not requirea querv such as listoll sectiotts oJJbred in fnlt 2004 to be changed,although the query would be e-xecuted .rore efficiently by the DBMS by utilizing the new accesspatl'r. Generally,physicaldata independenceexists in most databasesand tlle environnlents in which the exact location of data on disk, hardware details of storage encoding,placement,compression,splitting,mergingof records,and so or-rare hidden from the user.Applicirtionsrenrainunilivareof thesedetails.On the other hanj, logical data independenceis very hard to come by becauseit allorvsstructuraland c o n s tra i n t c h a n g e sw i t hout affecti ng appl i cati on programs-a much stri cter requirement. whenever we havea multiple-levelDBMS, its catalogmust be expandedto include infbrmation on how to ntap reqLlests and dataalrons the variouslevels.Tl-reDBIVIS usesadditionalsotiwareto accomplishthesemappingsby ret'erringto the mapping information in the catalog.Data independenceoccursbecausewhen the schemais changedat some level,the schemaat tl.renext higher levelremainsunchanged;on11, the mappingbetweenthe trvo levelsis changed.Hence,applicationprogramsreferring to the higher-levelschemaneednot be changed. The three-schentaarchitecturecirn make it easierto achievetrue clataindependence,both physicaland logical. Horvever,the two levelsof mappings createan overheadduring cornpilationor executiorrof i.rquery or progrirm,leadingto inefficienciesin the DBMS. Beciruse of this, few DBMSs haveimplementedthe full threeschemaarchitecture. 2.3 DatabaseLanguagesand Interfaces In section 1.4we discussedthe variety of userssupportedbv a DBNIS.The DBlvtS must provide appropriatelanguages and ir.rterfaces fbr eachcategoryof users.In this sectionwe discussthe types of languagesand interfacesprovided by a DBMS and the usercategoriestargetedby eachinterfirce. 2.3 DatabaseLanouaoesand Interfaces cci be ' col l I r.l l11s . rn ges r g t he hcma c rn al i.rvbe . rciltr. r l 0 r .l trot i c ccss r a st er . :004 .r the iro n()rirge : hiclh a nd , rl .rr.rd ricter c lu de ) B\ I S ppillg nt.l ls : o nlv rcfer- -'Pe11.tc a l l nt'fllh r.-e- )I J\ I S n tl.ris : .r n cl 2.3 .1D B MS La n g u a ges Once the designof a databaseis completedand a DBMS is chosento implenrentthe tlltabase,the first stepis to specif' conceptualand internal schemasfor the database .rnclany mappingsbetweenthe two. In marryDBMSs where no strict separationof lc.r'elsis maintained, one language,calied the data definition language (DDL), is Lrsed by the DBA and by databasedesignersto defineboth schemas.The DBMS will h.u.ea DDL compiler whosefunction is to processDDL statementsin order to identif\'descriptionsof the schemaconstructsand to storethe scher.na descriptionin the l)BMS catalog. ln DBMSswherea clearseparationis maintainedbetweenthe conceptualand internal levels,the DDL is usedto specifythe conceptualschemaonly.Another liu-rguage, the storage definition language (SDL), is used to specif' the internal scl-rer.ntr. The nrirppingsbetween the two schemasmay be specifiedin either one of theselanquages.In most relationalDBMSs today,there is no specificlanguagethat perfbrms the role of SDL. Instead, the internal schema is specified by a cor-r-rbinatior.r of parametersand specificationsrelatedto storage-the DBA staff typically controls inclexingand mapping of data to storage.For a true three-schemaarchitecture,rve rvould need a third language,the view definition language (VDL), to specify user viervsand their mappingsto the conceptualschema,but in most DBMSs the DDL is usedto define both conceptualand externalschemas.In relationalDBMSs,SQL is usedin the role of VDL to define user or applicationviews as resultsof predefined (seeChapters8 and 9). qLreries ()nce the databaseschemasare compiled and the databaseis populatedwith data, usersmust have some means to manipulate the database.Typical manipulations includeretrieval,insertion,deletion,and modification of the data.The DBMS pror ides a set of operations or a languagecalled the data manipulation language DML) for thesepurposes. ln current DBMSs,the precedingtypes of languagesare usually not considered distirrct languages;rather, a comprehensiveintegrated languageis used that includes constructsfor conceptualschemadefinition, view definition, and data manipr-rlation. Storagedefinitior-ris typicallykept septrrate, sinceit is usedfor defining phvsical storagestructuresto fine-tune the performanceof the databasesystem,lvl-richis tusr"rally done by the DBA staff.A typical exarnpleof a cornprehensivedatabaselanguageis the SQL relationaldatabaselanguage(seeChapters8 and 9), rvhich representsa combination of DDL, VDL, and DML, as well as statementsfor constraint specification,schemaevoiution, and other features.The SDL was a component in earlyversionsof SQL but hasbeen rernovedfror.nthe languageto keepit at the conceptualand externallevelsonly. 'fhere are two main types of DMLs. A high-level or nonprocedural DML can be used on its own to specif, complex databaseoperationsconcisely.Many DBMSs allow high-levelDML statementseither to be enteredinteractivelyfrom a display in a general-purposeprogramming lanrnonitor or terminal or to be en-rbedded guage.In the latter case,DML statementsmust be identifiedwithin the program so 37 Chapter2 DatabaseSystemConceptsand Architecture that they can be extractedby a precornpilerand processedby the DBMS. A lowlevel or procedural DML rrrrr-sr be embeddedin a general-purposeprogramming language.This type of DML typically retrieves individual records or objects from eachseprarately. Therefore,it needsto rlseprogranrming the databaseand processes languageconstructs,such aslooping, to retrieveand processeachrecordfrom a set of records.Low-level DMLs are erlsocalled record-at-a-time DMLs becauseof this property.DL/1, a DML designedfor the hierarchicalmodel, is a lol-level DML that to u s e sc o mma n d ss u c ha s GE TU N IOU EGE , T N E X T,or GE TN E X TW ITH INP A R E N T, navigatefrom recordto record rvithin a hierarchyof recordsin the database.Highlevel DMLs, such as SQL, can specifl,and retrievemany recordsin a single DML statement;therefbre,they.arecalledset-at-a-time or set-oriented DMLs. A qr-reryin a high-levelDivll often specifiesrll ich datato retrievelather than lrcw lo retrieveit; therefore,sr.rchlanguagesare also called declarative. Whenever DML comn-rands,whether high level or lorv level, are embedded in a ger-reral-purpose progran-uninglanguage,that languageis called the host language and the DN4L is called the data sublanguage.r0On the other hand, a high-level DML usedin a standaloneinteractivemanner is calleda querylanguage.In general, both retrievaland updatecommandsof a high-levelDML may be usedinteractively part of tl-requery language.rl and are hencecoi'rsidered Casual end userstypically use a high-level query languageto specifr their requests, whereasprogrammers use the DML in its embedded form. For naive and parametric users,there usuallv irre user-friendly interfaces for interacting with the database;thesecan alsobe usedby cirsualusersor otherslvho do not want to learn the detailsof a high-levelquery language.We discussthesetypesof interfacesnext. 2.3.2 DBMS Interfaces User-friendly interfacesprovided by a DBMS may include the following: Menu-Based Interfaces for Web Clients or Browsing, Theseinterfacespresent the userwith listsof options (calledmenus) that leadthe userthrough the formulation of a reqlrest.Menus do awayrvith the need to mernorize the specificcommands and syntaxof a query language;rather,the query is con-rposed step-by-stepby picking options from a menu that is displayed by the system.Pull-down menus are a very popular technique in Web-based user interfaces. They are also often used in browsing interfaces, which allow a user to look through the contents of a database in an exploratoryand unstructuredmanner. 1 0 . lr o b le ct o a la o a se s,tre rost a1d odta srbaaguages [yoi cal l y'orrr o1e .rl eQrareritar^guage-fo' e xa m p le ,C+ + with so m e extensi onsto supportdatabasefunct onal i ty,Some rel ati onalsystemsa so pro v d e in te g r a te dla n g u a g e s-forexampl e,Orac e's PLISOL, 1 1 .Acco r d r n gto th e En g lishmeani ngof the w orcjquery,i i shoul dreal l ybe used to descr be retri evas onl y , n o t u p d a te s. 2.3 DatabaseLanouaoesand Interfaces -{ lowrn r ming .ts fiom rn.rming )m a set : oi this \ l L t hat I EN T ,tO : . H ighl c DML l u e r y in :riL'\'eiU l cd in a rnguage i h -leve l !c-n€f?1, .rctively r'QU€StS, .iramettc' data-'arn the present rrmular m an ds :rvpickJS are a u s edi n l.rtabase displaysa fbrm to each user. Forms-Based Interfac€s. A forrns-basedir-rterface L-serscan fill out all of the form entriesto insert new data,or they can fill out only ccrtainentries,in which casethe DBMS will retrievernatchingdata for the remaining entries.Forms are usually designedand programmed for naive usersas intertircesto canned transactions.Many DBMSs have forms specification languages, rvhichare speciallanguagesthat help programmersspecifysucl-rfbrms. SQL*Forms is ir form-basedlanguagethat specifiesqueriesusing a form designedin conjuncti o n rv i th th e re l a ti o n a ld a tabaseschema.Oracl e Forms i s a component of the L)racleproduct suite that providesan extensiveset of featuresto designand build .rpplicationsusing forrns.Some systemshaveutilities that define a form by letting the end userinteractivelvconstructa sampleforrn on the screen. Graphical User Interfaces. A GUI tl,picallydisplaysa schena to the user in diaqrammaticform. The userthen can specifya query by manipulatingthe diagram.It-t nriinycases,GUIs utilize both menus and forrns.Most GUIs use a pointing device, parts of the displayedschemadiagram. \uch elsa mouse,to pick certair.r Natural Language Interfaces. Theseinterfacesacceptrequestsrvritten in English ()r some other languageand attempt to ruulerstotrd them. A natural languageinterwhich is similar to the databaseconceptualschema, l.rceusuallyhas its own schenra, .rsrvellas a dictionary of important words.The natural languageinterfacerefersto the rvordsir-rits schema,as rvell as to the set of standardwords in its dictionary,to generatesa interpret the request.lf the interpretation is successful,the ir.rterface high-levelquery correspondingto the natural languagerequestand submits it to the DBMS for processing;otherrvise,a dialogueis startedwith the userto clarifl' the rcquest.The capabilitiesof natural languageinterfaceshave not advancedrapidly. lirclal',we seesearchengir.res that acceptstringsof ntrturallanguage(like Englishor :panish) words and match them u'ith docurnentsat specificsites(for local search cngines)or Web pageson the Web at large (for engineslike Google or AskJeeves). 'l'hev use predefinedindexeson rvords and use ranking functions to retrieveand prr-sentresultingdocumentsin a decreasingdegreeof match. Such"free form" texttral query interfacesare not common on structured relational or legacvr.r.roclel databases. Speech Input and Output. Limited useof speechas an ir-rputquerv and speechas .l n a n s w e r to a q u e s ti o n or resul t of a request i s becomi ng cornmonpl ace. directory, ,\pplications with limitecl vocabulariessuch as inquiries for telephor-re t'light arrival/departure,and bank account inforrnation are allowing speechfor inpr-rtand output to enableordir-raryfolks to accessthis inforrnation. The speech input is detectedusing a library of predefinedwords and usedto set up the paramt.tersthat are suppliedto the queries.For output, a similar conversionfrom text or into speechtakesplace. nr.rmbers Interfaces for Parametric Users. Parametricusers,such as bank tellers,ofterr havea small set of operationsthat they must perform repeatedly.For example,a Chapter 2 DatabaseSystem Concepts and Architecrure teller is able to use single-function keys to invoke routine and repetitive transactions such as depositsor withdrawals into accounts,or balanceinquiries. S),stemsanalysts and programmers design and implement a specialinterfacefor eachknown classof naive.users.Usually a small set of abbreviatedcommands is included, with the goal of minimizing the number of keystrokesrequired for each request. For example, function keys in a terminal can be p.og.un-r-.d to initiate,nurious commands. This allows the parametric user to proceed with a minirnal number of keystrokes. Interfaces for the DBA. Most databasesystemscontainprivileged commandsthat can.beused only by the DBAs staff.Theseinclude .o,rr-u.,d, foi creating accounrs, settingsystemparameters,granting accountauthorization,changi'g a schema,and reorganizingthe storagestructuresof a database. 2.4 The DatabaseSystemEnvironment A DBMS is a complex softwaresystem.In this sectio.rwe discuss the types of software componentsthat constitutea DBMS and the types of computer systemsoft_ ware with which the DBMS interacts. 2.4.1 DBMS ComponentModules Figure2.3 illustrates,in a simplified form, the typical DBMS components. The fig_ ure is divided into two halves.The top half of the hgure refersto the various usersof the databaseenvironment and their interfaces.Thelower half shows the internals of the DBMS responsiblefor storageof data and processingof transactions. The databaseand the DBMS catalogare usually stored on disk. Access to the disk is controlled primarily by the operating system (os), rvhich schedules disk input/output. A higher-level stored data manager module of the DBMS controls access to DBMS information that is storedon disk,whetherit is part of the database or the catalog. Let us consider the top half of the figure first. It shows interf'acesfor the DBA staff, casualuserswho work with interactive interfacesto formulate queries, application programmers who program using some host languages, atrd paiametric userswho do data entry work by supplying parametersto predefined tia.sactions. The DBA staff works on defining the databaie and tuning it by making changes to its defini_ tion using the DDL and other privilegedcommands. DDr compiler processe.s schemadefinitions, specified i' the DDL, and stores lhe descriptionsof the schemas(meta-data)in the DBMS catalog.The catalog includes information suchasthe namesand sizesof files,namesand dita types of dataitems, storagedetails of each file, mapping infbrmation among schemas, and constraints, in addition to many other types of information rhar areireeded by the DBMS rnod_ ules.DBMS softwaremodulesthen look up the cataloginformatio' as .eeded. Casuai users and persons with occasionalneed fbr infbrmation from the database interact using some form of interface,which we show, as interactive query inter- 2.4 The DatabaseSystem Environment Ir on s rh'sts lss of ' g o al n p le , T h is CasualUsers , IDDLI I t Sta lp m o n lq - ' - ' " " ' ""' " -----T----- ; that unts, . and softsoft- r f'lo- r'rs Of .i l s o f iskis disk rtrols rbase stafl ation . rr'ho I)BA etrni!t ore s I u d es tcnts, r i nt s, n r od a b ase l n ter- I ,/ V f-h-;,^^-l Ouery --T-- I J I V Application Programmers 41 Parametric Users + ) l--:---__l uuery IDDL I I l I Compiler I --T-- Itl C o m p i l e rI --------r------- i I I I I DBA Commands, Oueries,andTransactions V Il^,,1 System l, / I uatarog/ r Data ] l-*, I\- Dictionary I -/ Concurrency Control/ Backup/Recovery Subsystems Oueryand Transaction Execution Figur e2. 3 C nmnnnent nrndrrl ec nf : D B MS and thei r nterac tons . or form-basedinteractionthat tirce.We havenot explicitll-shown any r.t.te'nu-btrsed Thesequeri esare m a y b e L i s e dto g e l l e ra teth e i nteracti vequery ar,rtomati cal l y. parsed,analyzeclfor correctnessof the operationsfor the model, the namesof data elements,and so on by a query compiler that compilesthem into irn internal form. This internal query is subjectedto query optirnizationthat we discussin Chapter 42 Chapter 2 DatabaseSystem Concepts and Architecture 15.Among other things, the query optimizer is concernedwith rearrangementand possiblereorderingof operations,elimination of redundancies,and use of correct algorithms and indexesduring execution.It consults the systemcatalog for statistical and other physical information about the stored data and generatesexecutable code that performs the necessaryoperations for the query and makes calls on the runtime processor. Application programmers write programs in host languagessuch as fava, C, or COBOL that are subn-rittedto a precompiler. The precompiler extractsDML commands from an application program written in a host programming language. Thesecommandsaresentto the DML compiler for compilation into objectcodefor databaseaccess. The rest of the program is sent to the host languagecompiler.The object codesfor the DML commands and the rest of the program are linked, forming a cannedtransactionwhoseexecutablecode includescallsto the runtime database processor.These canned transactions are useful to parametric users who simply supply the parametersto these canned transactions so they can be run repeatedlyas separatetransactions.An exampleis a bank withdrawal transaction where the accountnumber and the amount may be suppliedas parameters. In the lower half of Figure 2.3,the runtime databaseprocessoris shown to execute (l) the privilegedcommands, (2) the executablequery plans,and (3) the canned transactionswith runtime parameters.It works with the systemdictionary and may update it with statistics.It works with the stored data manager,which in turn uses basic operating system servicesfor carrying out low-level input/output operations betweendisk and main memory. It handlesother aspectsof data transfer such as managementof buffers in main memory. Some DBMSs havetheir own buffer management module while others depend on the OS for buffer management.We have shown concurrency control and backup and recoverysystemsseparatelyas a module in this figure. They are integrated into the working of the run-time database processorfor purposesof transactionmanagement. It is now common to have the client program that accesses the DBMS running on a separatecomputer from the computer on which the databaseresides.The former is called the client computer running a DBMS client and the latter is called the databaseserver. In some cases,the client accesses a middle computer, calledthe application server, which in turn accesses the databaseserver.We elaborateon this topic in Se c ti o n2 .5 . Figure 2.3 is not meant to describea specificDBMS; rather, it illustratestypical DBMS modules. The DBMS interacts with the operating system when disk accesses-to the databaseor to the catalog-are needed.If the computer system is sharedby many users,the OS will scheduleDBMS disk accessrequestsand DBMS processingalongwith other processes. On the other hand, if the computer systemis mainly dedicatedto running the databaseserver,the DBMS will control main memory buffering of disk pages.The DBMS also interfaceswith compilers for generalpurpose host programming languages,and rvith application serversand client programs running on separatemachinesthrough the systemnetwork interface. 2.4 The DatabaseSvstem Environmenr nt a nd orrect : a t ist i utable rn the C, or L(rlll- t u ag e. ,defor 'r. The f orm. -l^+^ ud td- j \fho )e run action \ecute a nn ed d may N U SC S ations lch as m an .' have modtabase g on a mer is data'plicarp icin v pical r disk tem is )BMS tem is ntemneralclien t ,c', 2.4.2 DatabaseSystem Utilities ln addition to possessing the softrvaremodules just described,most DBMSs have databaseutilities that help the DBA managethe databasesystem.Common utilities hai'ethe following typesof functions: Loading. A loading utility is usedto load existingdata files-such as text files or sequentialfiles-into the database.Usually,the current (source)format of the data file and the desired(target) databasefile structure are specifiedto the utility, which then automaticallyreformatsthe clataand storesit in the database.With the proliferationof DBMSs,transferringdata frorn one DBMS to another is becoming colnllrolt in many organizations.Some vendors are offering products that generatethe appropriate loading programs, given the existingsourceand target databasestoragedescriptions(internal schernas). Such tools are also called conversion tools. For the hierarchicalDBMS called IMS (l B M ) a n d fo r many netw ork D B MS s i ncl udi ng ID MS (C ornputer Associates), SUPRA (Cincorr), or IMAGE (HP), the vendorsor third party companiesare rnakinga variety of conversiontools available(e.g.,Cincom's SUPRAServerSQL) to transform data into the relationalmodel. Backup. A backup utility createsa backup copy of the database,usuallyby dumping the entire databaseonto tape. The backup copy can be used to restorethe databasein caseof catastrophicfailure.Incrementalbackupsare also often used,where only changessincethe previousbackup are recorded. Incremental backup is r.norecomplex, but savesspace. Databasestorage reorganization. This utility can be used to reorganizea set of databasefiles into a differentfile organizationto inrproveperformance. Performancemonitoring. Such a utility monitors databaseusageand provides statisticsto the DBA. The DtsA usesthe statisticsin making decisions such as whether or not to reorganizefilesor whether to add or drop indexes to improve performarnce. t)ther utilities may be availablefor sorting files,handling data compression,monitoring access by users,interf'acing with the network,and performingother functions. 2.4.3Tools,ApplicationEnvironments, and CommunicationsFacilities Other tools areoften avaiiableto databasedesigners, users,and DBMS. CASEtoolsrr .rreusedin the designphaseof databasesysterns. Another tool that can be quite usetirl in large organizations is an expandeddata dictionary (or data repository) system. In addition to storing cataloginforr.nationabout schemasand constraints,the riatadictionary storesother ir.rformation,such as designdecisions,usagestandards, L Ar th o u g aCASE sta r o s io ' co r p u t e-a r d a ta b a sed e sig n , oed so't"l a e e-g 'l ee rng.nary C AS f too 5 dre L5eo p rq-dry Chapter2 DatabaseSystemConceptsand Architecture applicationprogram descriptions,and userinformation. Sucha systemis alsocaiied an information repository. This information can be accesseddirectly by users or the DBA when needed.A data dictionary utility is similar to the DBMS catalog,but mainly by usersrather it includes a wider variety of information and is accessed than by the DBMS software. Application development environments, such as the PowerBuilder (Sybase)or JBuilder (Borland) system,are becoming quite popular. These systernsprovide an environment for developing databaseapplications and include facilitiesthat help in many facetsof databasesystems,including databasedesign,GUI development, querying and updating, and application program development. The DBMS also needsto interfacewith communications softI^/are,whose function is to allow usersat locations remote frorn the databasesystemsite to accessthe databasethrough computer terminals, workstations, or their local personal computers. These are connected to the databasesite through data communications hardware suchas phone lines,long-haulnetworks,local netlvorks,or satellitecommunication devices.Many commercial databasesystenlshave communication packagesthat work with the DBMS. The integratedDBMS and data communicationssysternis calleda DB/DC system.lnaddition,somedistributedDBMSs are physicallydistributed over multiple machines.In this case,communications networks are neededto connect the machines. These are often local area networks (LANs), but they can also be other types of networks. 2.5 Centralizedand Client/Server Architecturesfor DBMSs 2.5.1 CentralizedDBMSsArchitecture Architectures for DBMSs have fbllowed trends sirrilar to those for general computer system architectures.Earlier architecturesused mainframe computers to provide the main processingfor all systemfunctions, including user application programs and user interfaceprograms,as well as all the DBMS functionality.The such systemsvia computer terminalsthat did reasonwas that most usersaccessed not have processingpower and only provided display capabilities.Therefore,all processingwas performed remotely on the computer system,and only display information and controls were sent from the computer to the display terminals, which were connectedto the central computer via various types of communications networks. As prices of hardware declined, most users replacedtheir terminals with PCs and workstations.At first, databasesystensusedthesecomputerssimilarly to how they had used display terminals, so that the DBMS itself was still a centralized DBMS in which all the DBMS functionality,applicationprogram execution,and user interfaceprocessingwere carried out on one machine.Figure2.4 illustratesthe physical 2.5 Centralized and Client/Server Architectures for DBMSs \r) called u scrsor r log ,but ': rather rJsc) or , r i de an t h p ln in ) P nt ent, i n ct ion ne da ta :lp Uters. r rd\\'are l ic. rt ion cc.sthat , \ t em i S .iistrib:cdedto h cl can 45 i o m p o n e n ts i n a c e n tra l i z edarchi tecture.Gradual l y,D B MS systemsstarted to trploit the availableprocessingpower at the user side,which led to client/server l)BNISarchitectures. 2.5.2 Basic Client/ServerArchitectures Irirst,we discussclient/serverarchitecturein general,then we seehow it is appliedto I )llN4Ss.The client/server architecture was developedto deal with computing envir()nmentsin which a largenumber of PCs,workstations,file servers,printers,datal..rseservers,Web servers,and other equipment are connectedvia a network. The idea is to define specialized servers with specific functionalities. For example,it is possibleto connecta number of PCs or small workstationsas clientsto a file server that maintainsthe files of the client machines.Another machine can be designated .is a printer server by being connected to various printers; thereafter,all print rr.questsby the clients are forwarded to this machine.Web servers or e-mail servers .rlsofall into the specializedservercategory.In this way,the resourcesprovided by .pecializedserverscan be accessed by many client machines.The client machines the user with the appropriate interfacesto utilize theseservers,as well as l.rovide rvith local processingpower to run local applications.This conceptcnn be carried ()\'crto software,with specialized prograrns-such asa DBMS or a CAD (computer.rrdeddesign)package-being stored on specificservermachinesand being made .rccessible to multiple clients.Figure 2.5 illustratesclient/serverarchitectureat the Figure 2.4 A physical centralized architecture. l l comrlers to l ica t ion r t r. T he h.rt did o re, ai l i i s p la y .-. : .- .,1 ^ rllllldlJt t Apdb"ti*l t Programs I t Terminal DisplayControl | fDisp t pBMSI Software OperatingSystem r un i c a- (- s a nd rr r ' t h ey u.\lsin r ln t er'hr sical l/O Devices (Printers, TapeD ri ves,...) Chapter2 DatabaseSystemConceptsand Architecrure logical level; Figure 2.6 is a simplified diagram that shows the physical architecture. Some machineswould be client sitesonly (for example,disklessworkstationsor workstations/PCs rvith disks that have only clieni software installed). other machineswould be dedicatedservers,and otherswould haveboth client and server functionality. The concept of client/serverarchitectureassumesan underlying framework that consistsof rnany PCs and workstations as well as a smaller tru-b., of mainframe machines,connectedvia LANs and other types of computer networks.A client in this framework is typically a user machine that providei user interface capabilities and loca-lprocessing.When a client requiresaccessto additional functionalitysuch as databaseaccess-that doesnot existat that machine,it connectsto a server that provides the needed functionality. A server is a system containing both hardware and software that can provide servicesto the client machines,such as file Figure 2.5 Logicaltwo-tier client/server architecture. Figure 2.6 Physical two-tier client,/server architecture, Diskless Client tilf Site 1 C l i ent with Disk 2.5 Centralized and Client/Server Architectures for DBMSs 'cture. )n s o r O the r server k that trame ent in rilities lin'server hardas file .l.ccss,printing, archiving,or databaseaccess. In the generalcase,some machines installonly client software,othersonly serversoftrvare,and still othersmay include iroth client and serversoftrvare, as illustratedir-rFigure2.6.However,it is more comnron that client and serversoftrvareusually run on separatemachines.TWo main lvpes of basic DBMS architecturesrverecreatedon this underlying client/server il'irnrework:two-tier and three-tier.1r We discussthem next. 2.5.3Two-TierClient/ServerArchitectures for DBMSs I'hc'client/serverarchitectureis increasinglybeing incorporated into commercial l)BN,{Spackages.In relationaldatabasemanagementsystems(RDBMSs),many of rr hich started as centralizedsysten-rs, the s1'sterncomponents that were first moved :o the client side were the user interfaceand application programs.BecauseSQL sceChapters8 and 9) provided a standardlanguagefor RDBMSs,this createda l,rgicaldividing point betweenclient and server.Hence,the query and transaction :irnctionaiityrelatedto SQL processingremainedon the serverside.In such archiiccture,the serveris often called a query server or transaction server becauseit r.ror.idesthesetwo functionalities.In an RDBMS the serveris also often calledar-r SQL server. in such a client/serverarchitecture,the r-rserinterfaceprograms and application prosramscan run on the client side.When DBMS accessis required,the program c:tablishesa connectionto the DBMS (rvhich is on the serverside);once the conncction is created,the client program can communicatewith the DBMS. A standard ..rlled Open Database Connectivity (ODBC) provides an application programming interface (API), which allowsclient-sideprogramsto call the DBMS, as long ls both client and server machines have the necessarysoftware installed. Most l)BMS vendorsprovide ODBC driversfor their systems. A client program can actu.rllvconnectto severalRDBMSs and sendquery and transactionrequestsusing the t)DBC API, which are then processedat the serversites.Any querv resultsare sent b.rckto the client program, rvhich can processor displaythe resultsas needed.A rclatedstandard for the lava programming language,called JDBC, has also been .lt-tlned.This allows Javaclient programs to accessthe DtsMS through a standard i rlterface. fh e s e c o n d a p p ro a c h to c l i ent/serverarchi tecturel vas taken by some obj ectoriented DBMSs,where the softrvaremoduies of the DBMS were divided between clicnt and server in a more integrated way. For example, the server level may includethe part of the DBN{Ssoftwareresponsiblefbr handling datastorageon disk p.rges, local collcurrencycontrol and recovery,buffering and cachingof disk pages, .rndother such functions.N'leanwhile, the client level may handle the userinterface; clatadictionary functions;DBMS interactionswith programming languagecompilt'rs;global query optimization. concrlrrencycontrol, and recoveryacrossmultiple \crvers;structuring of comprlexobjectsfrom the data in the buffers;and other such We d scuss i he tw o most basi c ones 3 T h e r e a r e m a n y o th e r va r ia to n s of cl i ent/serverarchrtectures. ,.T e , 47 48 Chapter2 DatabaseSystemConceptsand Architecture functions. In this approach,the client/serverinteraction is more tightly coupled and is done internally by the DBMS modules-some of which reside on the client and some on the server-rather than by the users.The exact division of functionalitv varies from systemto system.In such a client/serverarchitecture,the server has been called a data seryer becauseit provides data in disk pagesto the client. This data can then be structured into objects for the client programs by the client-side DBMS software itself. The architecturesdescribedhere are called two-tier architectures becausethe software components are distributed over two systems:client and server.The advantagesof this architecture are its simplicity and seamlesscompatibility with existing systems.The emergenceof the Web changedthe roles of clients and server,Ieading to the three-tier architecture. 2.5.4 Three-Tierand n-TierArchitecturesfor Web Applications Many Web applications use an architecturecalledthe three-tier architecture, which adds an intermediate layer between the client and the databaseserver,as illustrated i n F i g u re2 .7(a). This intermediate layer or middle tier is sometimes called the application server and sometimesthe Web server, depending on the application. This server plays an intermediaryrole by storingbusinessrules (proceduresor constraints)that are used to accessdata from the databaseserver.It can also improve databasesecurity by checking a client's credentialsbefore forwarding a request to the databaseserver. Clients contain GUI interfacesand some additional application-specificbusiness rules.The intermediate serveracceptsrequestsfrom the client, processesthe request and sendsdatabasecommands to the databaseserver.and then acts as a conduit for Figure 2.7 I o n ica l ih r e e - tie r ;iw C l i ent client/server architecture, w i tha c o u p l eo f c o mmo n l , usednomenclatures. Application Server or WebServer Database Server 2.6 Classification of DatabaseManagement Systems pl.'d and l i c n t a nd tion irl i ty .'rr.erhas crt. This : cnt -si d e :h c so ftc .rdvan: e r isti ng -. 1c'ading tions re.rr'hich , -:rt rated In Server l'la\-san irc used . r rin ' by .! \!'fvef. passing(partially) processeddata from the databaseserverto the clients,where it nrrrybe processedfurther and filtered to be presentedto usersin GUI format. Thus, tlrc lser interface,applictttiottrules, a'nddata nccess act as the three tiers. Figure l.l(b) showsanother architectureused by databaseand other applicationpackage \r'ndors. The presentationlayer displaysinformation to the user and allows data cntry. The businesslogic layer handlesintermediaterules and constraintsbefore tl.rtais passedup to the user or down to the DBMS. The bottom layer includesall Jrta managementservices.If the bottom layeris split into two layers(a Web server .urcla databaseserver),then this becomesa four-tier architecture.It is custornaryto .livide the layersbetweenthe user and the stored data further into finer componcnts, thereby giving rise to n-tier architectureswhere n may be four or five. I\ picaily,the businesslogic layeris divided into rnultiplelayers.Besidesdistributing i)rogrammingand datathroughout a netrvork,r-tier applicationsafford the advan:.rq eth a t a n y o n e ti e r c a n ru n on an appropri ateprocessor or operati ngsystenrpl attirrrn and can be handled independently.Another layer typically used by vendors of I:R P (e n te rp ri s ere s o u rc ep l anni r-rg), and C R M (customerrel ati onshi pmauagerrrent) packagesis the rniddlewarelnyer which,accounts for the front-end modules .ommunicating with a number of back-enddatabases. \clvancesin encryption and decryption technologymake it saferto transfersensirivedatafrom serverto client in encryptedform, whereit will be decrypted.The latIr.r con be done by the hardwareor by advancedsoftware.This technologygives hieher levelsof data security,but the network securityissuesremain a major concr.rn.Varioustechnologiesfor data compressionalsohelp to transferiargeamounts tri data from serversto clientsover wired altd wirelessnetworks. :.u : i n eS S r lc'tlU€St ::Juitfor 2.6 Classification of Database ManagementSystems Severalcriteria are normally usedto classifyDBMSs.The first is the data model or-r rvhichthe DBMS is based.The main data model usedin many current commerciirl l)BMSs is the relational data model. The object data model has been implementecl in some commercialsystemsbut has not had widespreaduse.Many legacyapplications still run on databasesystemsbasedon the hierarchical and network data models.Examplesof HierarchicalDBMSs include IMS (lBM) and some other systcms like System2K (SAS Inc.) or TDMS, which did not succeedmuch commercially.IMS continues to be a very dominant player among the DBMSs in use at sovernmentaland industrial installations,including hospitalsand banks.The netrvork data model was usedby many vendorsand the resultingproducts like IDMS rCullinet-now Computer Associates), DMS 1100 (Univac-now Unisys),IMAGE , Hewlett-Packard),VAX-DBMS (Digital-now Compaq), and SUPRA (Cincom) still have a following and their user groups have their own activeorganizations.If rveadd IBM's popular VSAM file systemto these,we can easilysaythat (at the time of writing), more than 50% of the worldwide-computerizeddata is in these socalledlegary databasesystems. Chapter2 DatabaseSystemConceptsand Architecture The relational DBMSs arreevolving continuously, and, in particular, have been incorporating n-ranyof the conceptsthat were developedin object databases.This has led to a new classof DBMSs calledobject-relationalDBMSs.We can categorize DBMSs basedon the data model: relational,obiect,object-relational,hierarchical, network, and other. The secondcriterion used to classifyDBMSs is the number of users supportedbv the system.Single-usersystemssupport only one user at a time and are mostly used with PCs.Multiuser systems,which include the majority of DBMSs, support concurrent multiple users. The third criterion is the number of sites over which the databaseis distributed. A DBMS is centralized if the data is stored at a single computer site.A centralized DBMS can support mr-iltipleusers,but the DBlv{Sand the databaseresidetotally at a singlecomputer site.A distributed DBMS (DDBMS) can havethe actualdatabase and DBMS softrvaredistributed over many sites,connectedby a computer network. Homogeneous DDBMSs use the same DBMS softrvareat multiple sites.A recent trend is to develop softwirreto accessseveralautonomous preexistingdatabases stored under heterogeneous DBMSs. This leads to a federated DBMS (or multidatabasesystem),in which the participatingDBMSs irrelooselycoupledand havea degreeof local autonomy.Many DDBMSs usea client-serverarchitecture. The fourth criterion continues to be cost. But it is very difficult to propose any type of classiflcationof DBlv{Ssbasedorl cost. Today we have open source (free) DBMS products like N'IYSQLand PostgreSQlthat are supported by third-party vendors The main RDBMS products are availableas free examinawith additional services. tion 30-da1,copy'r,ersions as u'ell as personalversious,rvhich may cost under $100 and allow a fair amourrtof functionality.The giant systemsare being sold in modular form with componentsto handle distribution, replication,parallel processing, n-robilecapability,and so on, with a lirrgenurnberof parirmetersthat rnustbe defined for the configuration. Furthermore, they are sold in the tbrm of licenses-site licensesallow unlimited use of the databasesystemrvith any number of copiesrunning at the customersite.Another type of licenselimits the number of concurrent usersor the nurnber of user seatsat a location. Standalonesingle user versionsof in the overallconfiguration somesystemslike ACCESSaresold per copy or ir-rcluded of a desktopor laptop.In irddition,datawarehousingand mining features,aswell as clatatypes,aremadeavailableat extracost.It is quite common support for aclclitional to pay in millions for the installation and maintenanceof databasesystemsannualll'. We can also classifra DBI\{S on tbe basisof the types of accesspath options for storing files.One well-known tamily of DBMSs is basedon invertedfile structures. Finall,v,a DBMS can be general purpose or special purpose. When performance is DBMS can be designedand built for a a primary consideration,a special-prurpose sr-rcha systemcanrlot be used for other applicationswithout speciticapplicirtior-r; major changes.Many airline reservationsand telephonedirectory systemsdeveloped in the pirstare specialpurposeDBMSs.Thesefall into the categoryof online transaction processing (OITP) systems,which rnust support a large number of delays. concurrenttransactionswithout imposing excessive 2.6 Classification of DatabaseManaoement Svstems r a v ebe en ases.This :Jtegorize 'rarchical, rtrrtc'dbY . 'rt lv use d ;.r)rt COn- ih ut e d.A .'ntralized I ()I rl l Va t i J at a base nctu'ork. .\ recent J.rtabases \)r multintl havea j rnv type r DBN{S , r'endors cramina: Jcr S100 ::'imcldu1,r;g5sing, 'c dct-tned .. ss-si te 'picsrun) ncu rrent :: . ions o f :lguration .r: rr'ellas aonltnon .innually. r l i o ns fo r tnrct u re s. r nra ncei s .uilt fo r a \ \\'ithout :rs devel,ri online u nrb er o f 51 l.et us briefly elaborateon the n-raincriterion for classiS,ing DBMSs:the datamodel. t'hebasicrelationaldatamodel representsa databaseas a collectionof tables,rvhere cach table can be stored irs a separateflle. The databasein Figure 1.2 resen.rbles a rc-lationalrepresentation.Nlost relational databasesuse the high-level query lan*uagecalledSQL and support a limited form of userviews.We discussthe relational rrrodel,its languagesirnd operatiol.ls,2urdtechniques[or programming relational .rpplicationsin Chtrpters5 through 9. l'he object data model definesa databasein terrnsof objects,their properties,and their operations.Objectswith the same stnlcture and behavior belong to a class, .rr.rdclassesare organizedinto hierarchies (or acyclic graphs). The operationsof cach class are specified in terms of predefined procedures called methods. l{clationalDBMSs hzrvebeenextendingtheir modelsto incorporateobjectdatabase conceptsand other capabilities;thesesystemsare referredto as object-relationalor extended relational systems.We discussobject databasesand oL'rject-relational systcrnsin Chapters20 to 22. Trvoolder, historically important data rnodels,r-rowknown as legacydata moclels,iire the network and hierarchical rnodels.The network model representsdata as record tvpesand alsorepresentsa limited type of l:N relationship,calleda set t)?e. A 1:N, o r o n e -to -m a n y ,re l a ti o n shi prel atesone i nstanceof a record to many record instancesusingsomepointer linking mechanismin thesernodels.Figure2.8 shorvsa rretrvorkschemadi:rgram for the datirbaseof Figure 1.2,where record types are and settvpesare shown as labeleddirectedarrows.The network .hown as rectangles nrodel,alsoknown as the CODASYL DBTG modei,lahas an associated record-at-atime languagethat rnust be embeddedin a host prograrnminglanguage.The netn'ork DML was proposedin tl-re1971 DatabaseTask Group (DBTG) Repe1135 ur't crtensiorr of the COBOL language.It provides commands for locating records COUR S E _OFFE R IN GS IS A S T U D E N TGR A D E S P R E R E OU IS ITE S E C TIONGR A D E S G R A D E _R E P OR T 1 4 , CODASYLDBT G sta n d sfo r Con{erenceon D ata SystemsLanguagesD atabaseTaskGroup,w hi ch s th e co m m itte eth a t sp e c { e d th e net'l ork mode and ts anguage. Figur e2. 8 Theschem aof Frgur e2,1 in net wor k m odelnot at ion. Chapter2 DatabaseSystemConceptsand Architecture d i re c tl y (e .g .,F IND A N Y (record-type> U S IN G < fi el d-l i st> , or FIN D D U P LICATE <record-type> USING <field-list>). It has commands to support traversalswithin set-types(e.g.,GET OWNER,GET {FlRST,NEXT,LAST}MEMBERWITHIN<set-type> WHERElcondition)).It alsohascommandsto storenew data (e.g.,STORE<recordtype>) and to make it part of a set type (e.g.,CONNECT<record-type>TO <settype>). The languagealso handles many additional considerationssuch as the currency of record types and set types,which are defined by the current position of the navigation processwithin the database.It is prominently usedby IDMS, IMAGE, and SUPRA DBMSs today. The hierarchical model representsdata as hierarchical tree structures.Each hierarchy representsa number of related records.There is no standardlanguagefor the hierirrchicalrnodel.A popular hierarchicalDML is DL/l of the IMS system.It dominated the DBMS market for over 20 yearsbetween 1965and 1985and is a very widely used DBMS u'orldwide eventoday and holds a very high percentageof data in governmental,health care, and banking and insurance databases.Its DML calledDL/ 1 was a de facto industry standardfor a long time. DL/ I has commands to locatea record (e.g.,GET {UNIOUE,NEXT}<record-type>WHERE, <condition>). It has navigational facilities to navigatewithin hierarchies(e.g.,GET NEXTWITHINPARENTor GET{FlRST, NEXT}PATH<hierarchical-path-specification) WHERE<condition>). It has appropriatefacilitiesto storeand updaterecords(e.g., INSERT(record-type>, REPLACE<record-type>).Currency issuesduring navigation are also handled with aclditional featuresin the language.We give a brief overviewof the network and hierarchicalmodelsin AppendicesE and F.rs The eXtended Markup Language (XML) model, now consideredthe standard for It combines data interchangeover the Internet,alsouseshierarchicaltree structLlres. with from representation models. Data is databaseconcepts concepts docurnent nested create representedas elements;with the use of tags,data can be to complex the object rnodel, but hierarchical structures. This model conceptually reser.nbles uses different terminology. We discussXML irnd how it is related to databasesin Chapter 27. 2.7 Summary In this chapter we introduced the main conceptsused in databasesystems.We three main categories: defineda data model and we clistinguished xs High-levelor conceptualdata rnodels(basedon entitiesand relationships) s Low-level or physicaldrrtamodels $s Representationalor irnplementation data models (record-based,objectoriented) 15 . T h e fu ll ch a o te r son the netw ork and hi erarchi calmodel s from the second edi ti onof th s book are a va ila b leo ve r th e In te rnetfrom thi s book's C ompani onWebsi teat http:,//w w w .aw ,com/el nrasri , 2.7 Sum m ar v )LICATE i \\'ithin t- tvPe> rccordO (se tr as t he . it io n of \I AG E, .rrchical rc ls no I) L / 1 o f 9 6 5a nd - 'n'h ig h cc data)l / 1 has I / H E RE, . 9. ,G ET cation) ' d s (e .9 ., nin'igaa b r i ef Jard for rmbines Da t a i s --omplex ,del,but basesin We -'r.ns. rships) trbiect- \\'e distinguished the schema,or description of a database,from the databaseitself. I'he schema does not changevery often, whereasthe databasestate changesevery time data is inserted,deleted,or modified. Then we describedthe three-schema l)BMS architecture,which allows three schemalevels: .r' An internal schemadescribesthe physical storagestructure of the database. ',' A conceptual schemais a high-level description of the whole database. ,,r External schemasdescribethe views of different user groups. .\ DBMS that cleanly separatesthe three levels must have mappings between the \chemasto transform requestsand results from one level to the next. Most DBMSs do not separatethe three levelscompletely.We used the three-schemaarchitecture to definethe conceptsoflogical and physicaldata independence. 'l-hen we discussedthe main typesof languagesand interfacesthat DBMSs support. .\ datadefinition language(DDL) is usedto definethe databaseconceptualschema. ln most DBMSs, the DDL also definesuser views and, sometimes,storagestructures; in other DBMSs, separatelanguages(VDL, SDL) may exist for specifying r i*vs and storagestructures.This distinction is fading away in today'srelational implementationswith SQL servingas a catchalllanguageto perform multiple roles, including view definition. The storagedefinition part (SDL) was included in SQL ()\'ermany versions,but has now been relegatedto specialcommandsfor the DBA in relationalDBMSs. The DBMS compilesall schemadefinitions and storestheir descriptionsin the DBMS catalog.A data manipulation language(DML) is usedfor speci$ing databaseretrievalsand updates.DMLs can be high level (set-oriented, nonprocedural)or low level (record-oriented,procedural).A high-levelDML can bc' embeddedin a host programming language,or it can be used as a standalone i.rnguage;in the latter caseit is often called a query language, \\'e discusseddifferent types of interfacesprovided by DBMSs, and the types of l)BMS userswith which eachir.rterface is associated. Then we discussedthe database environment, DBMS :\'st€rrr typical softwaremodules,and DBMS utilities for helping usersand the DBA perform their tasks.We continued with an overviervof the nvo-tierand three-tierarchitecturesfor databaseapplications,progressively moving torvardn-tier, which are now very common in most modern applications,particul.rrly Web databaseapplications. I-inally,we classifiedDBMSs accordingto severalcriteria: data model, number of users,number of sites,types of accesspaths,and generality.We discussedthe avail.rbilityof DBMSs and additionalmodules-frorn no costin the form of open source software,to configurationsthat annuallycostmillions to maintain.\Ve alsopointed out the variety of licensingarrangementsfor DBMS and relatedproducts. The main classificationof DBMSs is basedon the data model. We briefly discussedthe main data models used in current commercial DBMSs and provided an example of the network data model. 53