Week 10 10 Week 10 Contents page Objectives .................................................................................................. 5 Introduction ............................................................................................... 7 Architecture: Mainframe applications ....................................................... 8 Mainframe database applications ......................................................... 8 Further reading...................................................................................... 8 Architecture: File-sharing applications ..................................................... 9 File-sharing database applications ........................................................ 9 Further reading...................................................................................... 9 Architecture: Client-server applications .................................................. 10 Open Database Connectivity (ODBC)................................................ 10 ActiveX Database Objects (ADO)...................................................... 12 Object Linking and Embedding, Database (OLEDB) ........................ 13 ADO.Net ............................................................................................. 13 Problems with the client-server architecture ...................................... 14 Further reading.................................................................................... 14 Architecture: Web applications ............................................................... 15 Web applications ................................................................................ 15 The 3-tiered architecture..................................................................... 16 Client-server web applications ........................................................... 16 Advantages over client-server ............................................................ 16 Further reading.................................................................................... 16 Architecture: Web services ...................................................................... 17 A description....................................................................................... 17 An explanation .................................................................................... 17 A service-oriented architecture (SOA) ............................................... 18 Further reading.................................................................................... 18 Technology: Relational Databases .......................................................... 18 Simple data structures ......................................................................... 18 2 Week 10 Non-procedural programming ............................................................ 19 Structural data independence .............................................................. 19 Self-description ................................................................................... 20 Further reading.................................................................................... 20 Technology: Third Generation Languages (3GL) ................................... 20 Embedded SQL ................................................................................... 20 SQL DML statements ......................................................................... 21 Embedded SQL – the basics ............................................................... 21 Embedded SQL – cursors ................................................................... 22 Further reading.................................................................................... 23 Technology: Fourth Generation Languages (4GL) ................................. 24 4GLs – the context .............................................................................. 24 4GLs and relational databases ............................................................ 25 The fourth generation environment (4GE) ......................................... 25 4GLs and the PC ................................................................................. 25 Rapid application development (RAD) .............................................. 26 VB6 vs VB.Net ................................................................................... 27 Why not VBA.Net?............................................................................. 27 C#.Net ................................................................................................. 27 Further reading.................................................................................... 27 Technology: Object-oriented programming (OOP) ................................ 28 Impedance mismatch .......................................................................... 28 Object-oriented DBMS (OODBMS) .................................................. 28 Object-relational DBMS (ORDBMS) ................................................ 28 Object-relational mapping (ORM) ..................................................... 29 Further reading.................................................................................... 29 Technology: Extensible Markup Language (XML) ................................ 30 XML significance ............................................................................... 30 XML support in relational DBMS ...................................................... 30 Native XML DBMS (NXD) ............................................................... 30 Further reading.................................................................................... 31 Active influences ..................................................................................... 31 The Internet ......................................................................................... 31 Open Source Software ........................................................................ 32 Other influences? ................................................................................ 32 3 Week 10 Summary: The database application landscape ....................................... 33 A reminder about Wikipedia articles ....................................................... 33 Review Objectives ................................................................................... 34 4 Week 10 Objectives On completion of this module you should be able to: briefly describe the mainframe computing architecture describe the file-sharing architecture used by database applications run on personal computers (PCs) over local area networks (LANs) describe the client-server architecture used by database applications running on PC/LANs identify advantages of using the client-server architecture over the file-sharing architecture for database applications on a PC/LAN briefly explain what is meant by a web application, and describe the 3-tiered web application architecture explore the claim that web applications are client-server applications describe the phrase Total Cost of Ownership (TCO) and explain why the TCO of client-server applications running on a PC/LAN led to interest in web applications as a replacement architecture describe the role of ODBC in client-server and web applications explain why a developer might use ADO, JDBC, or ADO.Net instead of ODBC in a client-server or web database application compare and contrast ODBC, ADO, JDBC, ADO.Net and OLEDB explain what is meant by the phrase ADO object model, and describe how classes in this model provide a general-purpose object-oriented view of a relational database explain what is meant by the term B2C and B2B, and explain how XML and web services are used to support B2B applications briefly explain what is meant by the phrases web services architecture and service-oriented architecture (SOA) identify advantages relational DBMS (RDBMS) provided over earlier DBMS, and describe the contribution made by RDBMS to solving the application backlog problem explain how relational databases are self-describing, and explore the claim that SQL is a non-procedural language describe the first three generations of programming languages, and explain how 3GLs are significantly more abstract than 2GLs explain what is meant by the term embedded SQL, and describe where this technology is used describe the role of cursors in embedded SQL applications explain the role played by RDBMS in support of 4GLs 5 Week 10 6 explain what is meant by, and describe components of, the fourth-generation environment for database application developers explain why 4GLs emerged, why they became popular, and why the term lost popularity explain the phrase rapid application development (RAD) and give one example of a tool that supports this style of development explain how the personal computers and local area networks introduced an era of user-empowered computing briefly describe the first database applications that ran on PC/LANs compare and contrast Visual Basic version 6 (VB6), VB.Net, and C# explain why Visual Basic for Applications (VBA) is based on VB6 briefly explain the impact of object-oriented programming (OOP) languages on database application development explain the impedance mismatch between OOP and RDBMS briefly explain the terms OODBMS, ORDBMS, and ORM, and comment on the current impact of these technologies on database application development briefly describe XML and explain the impact it has had on database application development offer an opinion on the likelihood or native XML databases challenging the dominant position current held by RDBMS briefly comment on the influence of the Internet on database application development briefly comment on the influence of open source software on database application development identify and describe other significant contemporary influences on database application development identify influences that you feel may drive database application development in the future Week 10 Introduction The goal of the module is to provide a rough sketch of the contemporary database application landscape. The main architectures used by current database applications are explored. Technologies used to build those applications are examined. Finally, influences driving database application development today, and in the future, and considered. You might be surprised to learn that some applications developed 40 years ago are still in use today. New graduates are often asked to work on older systems. This module will provide you with an awareness of the range of technologies you may encounter in your first job. The technologies used to develop database applications over the last 40 years reflect the three main computing architectures used over that time – mainframes, PC/LANs, and the Internet. This module introduces the four main database application architectures used on those systems: centralised (on mainframe computers) file-sharing (on PC/LANs) client-server (on PC/LANs) web (on the public Internet or a private intranet) Over the last 40 years, database application development has changed considerably. Contemporary tools provide a much higher level of abstraction from the host computer. This enables the developer to focus more on the application and less on the computer, resulting in higher productivity. This module examines the contributions to productivity improvement made by relational databases, so-called fourth-generation languages (4GLs), and object-oriented programming (OOP). One motivation for using Wikipedia in this course is the role it plays in support of life-long learning. The IT field is vast and moving fast. Practitioners must chose areas of IT to focus their attention. However, it is useful to have an awareness of the broader field. Wikipedia is a great resource to support this goal. This module is devoted to broadening your awareness of issues in the field of database application development. This module closes by considering some active influences on contemporary database application development. The module provides a rough sketch of database application development over the last 40 years. So, what’s next? What influences are currently driving change? What influences will change database application development in the future? Reflection on these questions is encouraged. 7 Week 10 Architecture: Mainframe applications We have set the scope of our survey to technologies currently used by database applications. Once again, you may be surprised to learn that mainframe computers are alive and well in the 21st century. You may have encountered discussion about the latest “supercomputer”, but may not have encountered discussion about the latest “mainframe”. Our industry seems as prone to fashions as the clothing industry. A supercomputer is characterised by its computing capacity. A mainframe is characterised more by its I/O capacity and reliability. Supercomputers are applied to large scientific applications, like weather forecasting. Mainframe computers are applied to large government applications, like a census; or a large commercial application, like enterprise resource planning (ERP). Mainframe database applications Early database applications developed for mainframe computers adopted a centralised architecture. The application and database software all ran on the mainframe computer. Users interacted with applications through simple (or “dumb”) character-based terminals. Today, few users interact with mainframe applications through dumb terminals. Typically, users of legacy mainframe application will use a program on a personal computer (PC) emulating an old character-based terminal. Alternatively, a new graphical application may wrap the legacy application, hiding all direct interactions with the mainframe. For new applications, mainframes computers are used as servers for large web applications – applications exposing an interface through the public Internet or private intranets. Further reading Wikipedia provides articles on all topics of interest here: Mainframe computer: http://en.wikipedia.org/wiki/Mainframe_computer Supercomputer: http://en.wikipedia.org/wiki/Supercomputer ERP: http://en.wikipedia.org/wiki/Enterprise_resource_planning Terminals: http://en.wikipedia.org/wiki/Dumb_terminal#Dumb_terminal PCs: http://en.wikipedia.org/wiki/Personal_computer 8 Week 10 Architecture: File-sharing applications The first database applications running on PCs were small, single-user applications. The arrival of the local area network (LAN), introduced to allow sharing of files and printers, opened the door to shared database applications. The first shared database applications running on PCs were file-sharing applications. File-sharing database applications Microsoft Access can be used to develop a file-sharing application that will be shared by a small number of users. When developing a filesharing application in Access, it is normal to split the application into two files: the database file, holding the tables; and the application file, holding the forms, queries, reports, macros and modules. This simplifies the process of distributing a new version of the application. When a new version is released, the application file is replaced and table links are reset from the testing database file to the production database file. The simplest file management strategy for an Access file-sharing application is to place both files on a server. This works if all users have the same drive letter mapped to that server. If not, table links must be reset by each user. This is too inconvenient to perform manually, and may be beyond the capabilities of the non-professional developer to automate. As such, a copy of the application file is often placed on each user’s PC; this also reduces load time. Aside: One of the greatest strengths of Microsoft Access is that users can develop their own database applications. Although, professionals who inherit those applications may regard this as its greatest weakness The main problem with the file-sharing architecture is that the local database engine treats the remote database file as if it were local. Every disk page read from the database file must travel across the network. This loads the network and slows application response times. The client-server architecture (next section), solves this problem by passing SQL queries to the server and receiving result rows in return. As you have seen, developing Microsoft Access applications against a server database is no harder than against a Jet database. However, more knowledge is required to install and manage SQL Server. Today, small user groups continue to build small file-sharing database applications. And, some professionals will still be building small filesharing database applications for users. Further reading In the textbook, the file-sharing and client-server architectures are introduced on page 19, and covered in more detail on pages 516 and 517. 9 Week 10 Architecture: Client-server applications With the client-server architecture, there is only one database engine running in the system. The database engine runs on the server. The application (forms, reports, etc) run on the user’s PC – the client. When the application needs to access the database, an SQL statement is sent from the client to the server. This statement is processed on the server, returning any result rows to the client. The main advantage of the client-server architecture over the file-sharing architecture is reduced load on the network. The main disadvantage is that the server DBMS requires more technical skills to install and manage. Open Database Connectivity (ODBC) An important component in a client-server database application is the software providing the communication channel between the client application and the database server. An important standard for software performing this role is called Open Database Connectivity (ODBC). Aside: Software sitting between the client code and a server is often referred to as middleware. The ODBC standard was introduced in 1992. It defines a collection of functions through which an application can communicate with a relational database server. The attraction of using a standard interface is that the DBMS can be changed without the need for coding changes in the application. As such, applications could be built to be DBMSindependent. An application using the ODBC standard makes calls to functions exposed by an ODBC database driver. The database driver translates ODBC function calls into the native language of the DBMS. The connection between the application and the database driver can be established with the help of the host operating system, or by recording details of the required driver in a file. In this way, the driver used by an application can be changed by the application operator. Aside: Microsoft Windows includes an application for defining Database Source Names (DSN), shown on next page. Having associated a name (DSN) with details of an ODBC database driver, an application can simply use that name, possibly entered by a user, to connect with the required driver. 10 Week 10 With the growing popularity of object-oriented programming (OOP), ODBC is rarely used directly by new applications. However, the success of ODBC has provided a rich legacy of database drivers. And, with the wide range of accessible DBMS, new client applications that must connect to a data source continue to support the ODBC standard. New OOP applications developed for Microsoft systems can still leverage ODBC drivers. This is done through an OO library providing a wrapper to ODBC. Libraries provided by Microsoft for this purpose go by the names ADO and OLEDB, described below. Java developers use an OO wrapper for ODBC drivers called JDBC. The wide adoption of ODBC on Microsoft systems often leads to the assumption that it is a Microsoft standard. This is not the case. However, ADO and OLEDB are Microsoft standards. Further reading is available from the following Wikipedia articles: DSN: http://en.wikipedia.org/wiki/Database_Source_Name ODBC: http://en.wikipedia.org/wiki/ODBC JDBC: http://en.wikipedia.org/wiki/Jdbc 11 Week 10 ActiveX Database Objects (ADO) ADO is a Microsoft standard OO interface to data. One use of ADO is to provide an OO wrapper to ODBC drivers. However, ODBC was designed as an interface to relational databases, nothing more. In contrast ADO, in conjunction with OLEDB (next section), was designed as an interface to a wide range of data sources. Aside: At the time ADO was developed, "ActiveX" was the fashionable name for Microsoft’s common object model (COM). At the time, COM was the Microsoft standard for sharing OO code and controlling applications through published interfaces. The latest standard for publishing OO interfaces on Microsoft systems is called .Net. The phrase object model is often used to describe the OO interface provided by an application or library. The main ADO classes of interest to database applications are shown below. These classes are used by the database application client as follows: connection: holds details of a connection to a database server command: holds a command (query, stored procedure, etc) to executed on a database server parameter: to hold a parameter (for a query or stored procedure) recordset: to hold a set of records returned from the server record: to hold a single record in a recordset field: to hold details of a record field property: to hold details of a non-standard property of an object (this enables a DBMS to expose extended features of the DBMS) Use of these classes was demonstrated in the Week 9 lecture. The primary role of ADO is as a simple interface to OLEDB data providers. OLEDB is another Microsoft OO interface to data sources, but richer than ADO, providing more control over the data source connection. ADO is adequate for most applications. Further reading is available from the following Wikipedia articles: ADO: http://en.wikipedia.org/wiki/ActiveX_Data_Objects COM: http://en.wikipedia.org/wiki/Component_Object_Model .Net: http://en.wikipedia.org/wiki/.NET_Framework 12 Week 10 Microsoft on ADO: http://msdn.microsoft.com/enus/library/ms678262(VS.85).aspx ADO object model: http://msdn.microsoft.com/enus/library/ms675944(VS.85).aspx Object Linking and Embedding, Database (OLEDB) OLEDB is another Microsoft OO interface to data. One use of OLEDB is to provide an OO wrapper to ODBC drivers. However, the OLEDB interface was designed to support a wider range of data sources – spreadsheets, email stores, directory stores, etc. The success of OLEDB as a standard is seen in the range of available providers: http://www.carlprothman.net/Default.aspx?tabid=87 In comparison with ADO, the OLEDB interface provides more control over the data source connection. However, it is more complicated than the ADO interface. Most applications use ADO rather than OLEDB. Further reading is available from the articles: OLEDB: http://en.wikipedia.org/wiki/OLEDB OLEDB object model: http://msdn.microsoft.com/enus/library/zchy97y7(VS.71).aspx ADO.Net The latest version of the ADO library is ADO.Net. This is a .Net version of the ADO library. However, ADO.Net is not a direct implementation of ADO on the .Net framework. ADO.Net was designed to provide better support for disconnected applications and XML. Disconnected applications are supported by a robust implementation of optimistic locking. This is provided through the DataAdapter class. The ADO.Net object model is shown below: 13 Week 10 XML is well supported through the DataSet class – a class capable of holding a hierarchical collection of related records/rows, as one might find in an XML file. A DataSet carries a DataTableCollection – holding a collection of DataTable objects, each carrying a DataRowCollection. A DataSet also carries a DataRelationCollection, holding details of relationships between DataTable objects. Finally, the DataSet class includes methods for serialising data to an XML file, or loading data from an XML file. Further reading is available from the links below: ADO.Net: http://en.wikipedia.org/wiki/ADO.NET ADO.Net object model: http://msdn.microsoft.com/enus/library/27y4ybxw.aspx Problems with the client-server architecture As explained, the PC/LAN-based client-server architecture offered a number of advantages over the mainframe architecture: cheap to buy available rapid application development (RAD) tools scalability (easy to add additional PCs as required; compared with the cost of a new mainframe when capacity is exhausted) applications have a graphical interface (instead of the text-based interface on mainframes) users could develop their own applications, or afford to have them developed (thanks to 4GLs and RAD tools, described later) Many client-server applications were developed and implemented. However, increasing reliance on PC/LANs exposed a problem with the architecture – support costs: the time spent installing software on PCs, and investigating/fixing application client compatibility problems. With the passage of time, tools to manage the installation of software on PCs, and to diagnose problems, matured. However, the time taken to fix problems on PCs continued to impose a substantial cost. It became apparent that the high support cost for a client-server application had to be added to the low purchase cost to obtain the total costs of ownership (TCO): http://en.wikipedia.org/wiki/Total_cost_of_ownership The high TCO for client-server applications provided fertile ground for a new in-house database application architecture to emerge. The web application architecture provided that replacement (next section). Further reading In the textbook, the file-sharing and client-server architectures are introduced on page 19, and covered in more detail on pages 516 and 517. 14 Week 10 Architecture: Web applications You probably realise that the Internet is big – big in size, big in nature. The Internet is the very first international computer network. It may be the only international computer network the world will ever need. The Internet is having a substantial impact on society, including: news (print and broadcast) telecommunications (phone and fax) politics (communication, fund raising, etc) entertainment (television, videos, music, games, etc) commerce (business-to-consumer; business-to-business) education (mainly higher education), and training social networking Electronic commerce (eCommerce) or business-to-consumer (B2C) applications provided the first Internet “killer app” for database developers. The first B2C applications enabled consumer to shop on the Internet (buy books, etc). It was the initial stock market frenzy surrounding the potential of such applications that caused the “.com bubble” (http://en.wikipedia.org/wiki/.com_bubble). B2C applications exploited the reach of the Internet. Only a browser was needed to access these applications, and browsers were available for all PCs. Prior to the arrival of the Internet, the database development community was focussed on client-server applications built with rapid application development tools. However, the cost of supporting clientserver applications was becoming apparent. As tools for building Internet applications improved, the prospect of building in-house applications accessible through a browser became very attractive. Wikipedia on eCommerce: http://en.wikipedia.org/wiki/Ecommerce Web applications The term web application describes an application that is accessible through a browser, communicating via HTTP to a server on the public Internet or a private intranet. The term intranet describes a local, secure Internet sub-network. The same technologies used to develop an Internet database application can be used to develop an intranet database application. Collectively these applications are termed web applications. Wikipedia: http://en.wikipedia.org/wiki/Web_application In this module we simply introduce some of the reasons behind the development of web applications. In the following module we explore some of the technologies used to develop web applications. 15 Week 10 The 3-tiered architecture The classic web application uses a three-tier architecture – a client computer running a browser, a web server processing HTTP requests, and a database server processing database requests (received from the web server). A Java application running in the web server may use JDBC to communicate with the database server. An application developed with .Net may use OLEDB via ADO to communicate with the database server. The three-tier architecture is described in the following Wikipedia article: http://en.wikipedia.org/wiki/3-tier_architecture. Client-server web applications Notice that the three-tiered web application architecture incorporates use of the client-server architecture between the web server (the client) and database server (the server). The term client-server is most commonly used to describe the connection between a client application and a database server. However, the term also has a more general meaning as a style of communication between distributed application components. The following Wikipedia article describes this more general meaning: http://en.wikipedia.org/wiki/Clientserver_architecture Using this broader meaning of the term, the client computer running a browser in a three-tiered Internet application is also using the clientserver architecture to communicate with the web server. You should be aware of this second, broader meaning of the term. Advantages over client-server For intranet applications, the main advantage over client-server applications is that the web application client is a simple browser. Individual application clients do not need to be installed on PCs. Consequently, the cost of fixing compatibility problems with application clients is avoided. When a new version of an application is released, the new version only needs to be installed on the web server. Further reading An introduction to the Internet, and intranets, is provided on pages 411 to 414 in the textbook. The textbook provides plenty of additional material on the technologies used to build web applications. You can jump into this now if you feel so inclined, or leave it to next week. 16 Week 10 Architecture: Web services It would be wrong to pass without mentioning the web services architecture. Like web applications, we will explore this topic next week. However, the increasing number of applications adopting this architecture necessitates an introduction here. A description A web service is defined by W3C as “a software system designed to support interoperable machine-to-machine interaction over a network. It has an interface described in a machine-processable format (specifically WSDL). Other systems interact with the Web service in a manner prescribed by its description using SOAP-messages, typically conveyed using HTTP with an XML serialization in conjunction with other Webrelated standards.” You probably already know something about XML (if not, a Wikipedia article reference is provided below). We will drill into WSDL, XML and SOAP next week. WSDL is a specification for an XML file that describes a web service – the required input parameters, the type of output data produced, etc. SOAP is a specification for describing the protocol to be by the client application to access a web service; the protocol describes the required flow of XML documents/messages between the client and the server. Further reading from Wikipedia: W3C: http://en.wikipedia.org/wiki/W3c XML: http://en.wikipedia.org/wiki/XML SOAP: http://en.wikipedia.org/wiki/SOAP_(protocol) WSDL: http://en.wikipedia.org/wiki/Web_Services_Description_Language Note: This week, you only need a rough understanding of web services. You can use these articles again next week to deepen your understanding. An explanation Why the interest in web services? Web services provide a vehicle for conducting business-to-business (B2B) transactions over the Internet. The first commercial Internet applications were business-to-consumer (B2C) applications – buying books, etc. With XML, we have a standard for businesses, or industries, to specify the data needed by a business-tobusiness (B2B) transaction. Web services provide a reliable mechanism through which such transactions can occur. 17 Week 10 In addition to the use of web services for businesses interactions over the Internet, web services can also play a role in exposing interfaces to inhouse applications. In so doing, room is left for the possibility of moving those applications to the Internet later. A service-oriented architecture (SOA) Having raised the prospect of using web services to build wide-reaching interfaces to existing applications, the opportunity arises to fully exploit this approach in new applications. The term service-oriented architecture (SOA) describes applications that are designed to fully exploit the potential of exposing capabilities as web services. Wikipedia on SOA: http://en.wikipedia.org/wiki/Service_Oriented_Architectures Note: You only need a vague understanding of SOA this week. Further reading Web services only get a brief mention at the top of page 22 of the textbook. A good introduction to XML is provided on pages 450 to 474. SOAP is mentioned on page 450. With the generous coverage of XML, it seems odd that web services do not get more of a mention. Wikipedia on web services: http://en.wikipedia.org/wiki/Web_service Technology: Relational Databases In terms of database applications, it is difficult to top the significance of relational databases as a technological innovation in the last 40 years. You have studied relational database theory and practice in two courses now. Hopefully you have a solid grounding. Some of the more significant advantages provided by relational database systems over those they replaced are as follows: simple data structures set-oriented, or “non-procedural”, programming structural data independence self-description Simple data structures Data in a relational database is held in simple tables. Relationships between rows in different tables are established via data values held in those rows. In database systems used before relational systems, links between related data were established by hidden pointers. By exposing the relationships, and through the use of simple structures, it was possible for users to build 18 Week 10 their own applications. It simplified the job for professional developers too, improving productivity. Non-procedural programming Set-oriented DML enable simple database applications to be developed without the need for procedural code. The form and report developed in the T109 offering of Database Use & Design were developed without any procedural code (other than the form Reset Sample Data button, used for testing only). Again, this improves prospects for users developing their own applications, and simplifies the task for professionals. Another advantage of relational query languages is that their abstract, setoriented nature enables an optimiser to decide at runtime how best to process a query. The optimiser can decide if an index should be used to process a query based on the current selectivity of the index. The optimiser can decide on the best way to join two tables based on the current number of rows in each table. The term non-procedural was popular in the early days of the relational model. At this time, advocates were working hard to sell the advantages of the new approach. With the addition of procedural extensions to the SQL standard (SQL/PSM), this claim is clearly inappropriate. Structural data independence As you know, an ordering of the columns in a relational database table (base or view) cannot be assumed. When selecting columns of interest in a table, that selection must be made using column names, not the position of columns. Aside: It is possible to violate this requirement using the SQL INSERT statement, but this is a failing the SQL language, not relational systems per se; this practice should be avoided. Also: The practice of using “*” in the SELECT clause of an SQL query should also be used with caution in applications. The addition of new columns could have a negative impact on the application, particularly if new columns hold large data items (embedded objects, perhaps). Like columns, it is wrong to assume any ordering of the rows in a relational table. If a particular ordering of result rows is required, that must be made explicit. SQL has an ORDER BY clause for this purpose. By adopting these principles, and working around the faults (or conveniences?) in SQL, columns can be added to a table without affecting existing applications. Another aspect of structural data independence is achieved through the use of virtual tables. This feature is implemented in SQL as a view. A view can be used to hide structural changes made to base tables. As you have seen in this course, SQL views can support any reasonable modification through the use of INSTEAD OF triggers. 19 Week 10 Self-description All relational databases carry a system catalog. The SQL standard defines a standard for the system catalog. It is the self-contained description of itself that enables development tools to present list of tables, columns, relationships, views, stored procedures, etc, to uses. The list of objects visible in the Navigation Pane of an Access database project is obtained by querying the SQL Server system catalog. The same catalog is queried to list columns when editing the design of a table. Further reading At this stage in your studies, you should be able to fully understand all topics covered in the Wikipedia article on relational databases: http://en.wikipedia.org/wiki/Relational_database Technology: Third Generation Languages (3GL) When relational databases were introduced, the dominant application development technology was the third generation language (3GL). Most prominent among 3GLs for building database applications was COBOL. Wikipedia articles explain: 3GL: http://en.wikipedia.org/wiki/Thirdgeneration_programming_language COBOL: http://en.wikipedia.org/wiki/COBOL Embedded SQL In order to make relational database available to 3GL programmers, a means of using relational DML from these languages was needed. The name used to describe the standard use of SQL from 3GL programs is embedded SQL. This term refers to the placement of SQL statements into programs written in procedural, or imperative, programming languages. In particular, it refers to the use of methods described in the SQL standard. Wikipedia on imperative programming: http://en.wikipedia.org/wiki/Imperative_programming SQL was developed as a relational DML in the early 1970s. At that time the dominant computing platform was the mainframe computer, and the most popular programming language for non-scientific applications was COBOL (introduced in the early 1960s). Although you probably first met SQL in an interactive environment (Access queries, perhaps), when SQL was being developed, the priority was to make SQL available from languages like COBOL. 20 Week 10 Aside: The form you developed for Assignment 2 included some procedural code. As demonstrated in the Week 8 and Week 9 lecturers, the VBA code behind a Microsoft Access form can execute SQL statements against a SQL Server database. This is the same functionality as a COBOL program with embedded SQL. However, we tend not to talk about such programming as “embedded SQL”. This term normally describes the use of methods found in the SQL standard (see below). Also: Contrary to popular myth, COBOL is not dead. An Internet search on “COBOL dead” will expose articles refuting this myth. With the impending retirement of baby-boomer developers, demand for COBOL skills may increase in the future. See: http://www.odinjobs.com/blogs/careers/entry/cobol_media_reports_of_its On 4th September 2009, ITJobsWatch (http://www.itjobswatch.co.uk/) shows the term COBOL ranked as the 486th most common term in UK IT on-line job ads, having dropped 48 places in a year. SQL DML statements As you know, SQL includes DML, DDL and DCL statements. For most applications, it is embedded DML statements that are of most interest. The SQL DML statements are INSERT, SELECT, UPDATE and DELETE. One of these statements creates more issues than the others when used in a COBOL program. Exercise: Which of the four SQL DML statements is likely create more issues than the others when used in a COBOL program? Try to answer this question before continuing. Hint: SQL has been described as a nonprocedural language. Embedded SQL – the basics Before returning to the issue raised above, let’s look at some example embedded SQL statements. Each of the following statements can be placed into a COBOL program. EXEC SQL INSERT INTO SP (SNO,PNO,QTY) VALUES (:SUPP-NBR,:PART-NBR,:NBR-PARTS) END-EXEC. EXEC SQL SELECT SUM(QTY) INTO :NBR-PARTS FROM SP WHERE SNO = :SUPP-NBR END-EXEC. EXEC SQL UPDATE S SET STATUS TO 60 WHERE SNO = :SUPP-NBR END-EXEC. EXEC SQL DELETE FROM SP 21 Week 10 WHERE QTY = 0 END-EXEC. In the above statements, SUPP-NBR, PART-NBR and NBR-PARTS are variables declared in the host language. As seen in the examples, these variables are used to provide input values to SQL statements and to receive values returned by SQL statements. Typically, the way embedded SQL works is that a program called a precompiler is run on the embedded SQL program to replace the SQL statements with statements in the native language. The new statements call procedures in a database library to process the SQL statements. The SQL statements, called database request modules (DBRM), are stored in the system catalog of the database (like a view). Let’s take the example SELECT statement below. EXEC SQL SELETE INTO FROM WHERE END-EXEC. SUM(QTY) :NBR-PARTS SP SNO = :SUPP-NBR The precompiler will replace this statement with a call to a procedure to process the SQL statement. The resulting statement may resemble the one below. CALL EXECSQL SQLCODE SUPP-NBR NBR-PARTS. Notice that, in this case, the procedure call has one input variable (SUPPNBR) and one output variable (NBR-PARTS) for the SQL statement. Also, an output variable called SQLCODE is used to provide feedback on the success, or otherwise, of the SQL statement. Finally, an additional input value (not shown above) is passed to the called procedure to identify the DBRM to be processed. Embedded SQL – cursors As mentioned, one of the SQL DML statements creates some additional issues when embedded in a 3GL program – the SELECT statement. The embedded SELECT statement below is a single-row SELECT. This statement will produce no more than one result row; the value of SQLCODE will indicate if no row is returned. Complications arise with embedded SQL statements that return more than one row. EXEC SQL SELECT INTO FROM WHERE END-EXEC. SUM(QTY) :NBR-PARTS SP SNO = :SUPP-NBR The mechanism used to handle a collection of rows returned by an embedded SELECT statement is the cursor. A cursor is a pointer to the current record in the returned recordset. Typically, cursors are declared at the top of the program. To retrieve a record from a cursor, the FETCH 22 Week 10 statement is used. For example, a program might include the cursor declaration shown below. Later in the program, appearing in a loop, the following embedded FETCH statement may be used. EXEC SQL DECLARE MY-CURSOR FOR SELECT SNO,STATUS FROM S ORDER BY SNO END-EXEC. EXEC SQL FETCH NEXT FROM MY-CURSOR INTO :SUPP-NBR,:SUPP-STATUS END-EXEC. By default, it is only possibly to fetch the next row from a cursor. However, if the cursor is declared to be of type SCROLL it is also possible to fetch the NEXT, PRIOR, FIRST, and LAST row. Rows can also be updated through a cursor. If rows are to be updated, this must be stated in the cursor declaration. For example, a program might include the following declaration and subsequent UPDATE statement. EXEC SQL DECLARE MY-CURSOR FOR SELECT SNO,STATUS FROM S ORDER BY SNO FOR UPDATE OF STATUS END-EXEC. EXEC SQL UPDATE S SET STATUS = :SUPP-STATUS WHERE CURRENT OF MY-CURSOR END-EXEC. The textbook has little to say about classic embedded SQL (as defined by the SQL standard). However, it does talk about the use of SQL and cursors from stored procedures and triggers written with PL/SQL in Oracle, and T-SQL in SQL Server. Further reading The topic of embedded SQL is covered in the section starting on page 247 of the textbook. The following sections in Chapter 7 provided examples of the use of embedded SQL in triggers and procedures. Wikipedia on embedded SQL: http://en.wikipedia.org/wiki/Embedded_SQL 23 Week 10 Technology: Fourth Generation Languages (4GL) Important contributions to increased productivity for the database application developer came from the phenomenon of so-called fourth generation languages (4GLs). Although the term was not introduced until the early 1980s, the first tools that would later carry the 4GL tag were introduced in the 1970s. The name 4GL attempts to convey the idea that 4GLs were to COBOL what COBOL was to assembler – a generational improvement in productivity. The first three generations of programming languages are: first generation: binary (machine code) second generation: assembler language third generation: FORTRAN, COBOL, Pascal, PL/1, C, etc It is worth noting that 3GLs provided one of the first great leaps in abstraction. Assembler languages presented the programmer with a symbolic representation of the instruction set of a computer. FORTRAN and COBOL presented the programmer with an abstract set of operations that could be compiled to different computer instruction sets. The Wikipedia article on Assembly Language explains: http://en.wikipedia.org/wiki/Assembly_language To understand 4GLs, one must understand the context in which they emerged. 4GLs – the context The roots of 4GL technology appeared in the 1970 world of centralised computer systems. At this time mainframe computers were being challenged by minicomputers that were less powerful, but cheaper to purchase and operate. Large companies could afford more computers. Smaller companies could own their first computer. The number of possible applications seemed endless. One problem was that developers couldn’t keep pace. There was a growing application backlog. This environment gave rise to many efforts to improve the productivity of COBOL programmers. One early contribution came from programs to generate the SCREEN SECTION code for a COBOL program. Code could be generated for an application menu or form. Early report writers also generated COBOL code for programs to produce simple reports. 3GLs had shown the advantages of abstract tools. As such, the proposal for a database model based on mathematical relations was received with interest. 24 Week 10 4GLs and relational databases E. F. Codd first proposed the relational model in 1970. Relational database management systems (RDBMS) started to appear several years later. RDBMS brought three features that promised to help with the application backlog: simple data structures, non-procedural programming, and a system catalog. With the prospect of databases that would be easier to use, arriving in the context of an application backlog, the vision of users (scientists, engineers, accountants, etc.) developing their own applications was born. One of the first class of tool to exploit the simple data structures and setoriented nature of relational databases was the query language. Initially these were primitive tools used by programmers to debug database applications. However, their potential as a tool for users was obvious. Another early tool to exploit relational databases was the report writer. Developers and users were able to produce simple reports in a fraction of the time needed to write a COBOL program. Many of the first tools claiming to be “4GLs” were report writers. The fourth generation environment (4GE) The emerging suite of new productivity tools (relational databases, query languages, report writers, screen generators) gave rise to the idea of a fourth generation environment (4GE). This would be an integrated development environment, built around a relational database, and exploiting the system catalog. Today, Microsoft Access provides a realisation of that early vision for a 4GE. 4GLs and the PC PCs arrived on the scene in the early 80s. At that time, the application backlog was a cause of much frustration. It is widely held that the killer application for PCs was the spreadsheet. Spreadsheets allowed accountants and managers to develop budgets and other simple data processing applications. One reason PCs were successful was that a department could to buy a PC out of its own budget. So began an era of user-empowered computing. Along with the success of spreadsheets, the other application to arrive with PCs was the word processor. Before PCs, the word processor was a special-purpose device. The prospect of running a spreadsheet and word processor on the same device was attractive. Following the success of spreadsheets and word processors, database tools followed. The best know early database development tool on PCs is dBase (http://en.wikipedia.org/wiki/DBase). Kroenke mentions dBase in “A Brief History of Database Processing” (pages 18 to 22). 25 Week 10 With the arrival of the graphical user interface (GUI) on PCs, the pointand-click, drag-and-drop interface gave momentum to the development of tools for the non-professional to build database applications. Dedicated non-programmable tools emerged to target that market. In contrast, Microsoft developed Access to target both the casual and professional developer. A point-and-click macro language was provided for the casual developer and a full programming language was provided for the professional (http://en.wikipedia.org/wiki/Microsoft_Access). By the time Microsoft Access was released (1992), the term 4GL had lost its popularity. One problem with the term was that many tools claiming to be 4GLs were not programming languages. In the case of Microsoft Access, it would be unclear if Access the product was the “4GL”, or one of the two supported programming languages (macros and VBA), or both, or all three. The term fourth generation environment (4GE) was never widely adopted. The success of Microsoft Access and competitive products (like Paradox and FoxPro) led to increasing levels of database integration into tools devoted to the professional developer. The popular name that emerged to describe these tools was rapid application development (RAD). Rapid application development (RAD) There are many similarities between RAD tools and 4GLs. Indeed, the approach used to develop an application using Microsoft Access, a 4GL, was very similar to that used to develop the same application using Visual Basic version 6 (VB6), a RAD tool. The main differences were: markets the products targeted, and scope of applications that could be developed Microsoft Access was targeted at: the power-user (for small database applications) the in-house developer (for small database applications). Visual Basic was targeted at: the in-house developer (for applications of any size) the in-house component developer (for developing distributed components for large applications; the scalability of large applications is improved by spreading the processing load across multiple servers) development teams (for large applications; support is included for version control and distributed applications) the independent software vendor—ISV (for royalty-free distribution of applications) One important early market for RAD tools was large organisations looking to move mainframe applications to PC/LANs. This process was known as downsizing an application. RAD tools were also used to upsize file-sharing applications. The term rightsizing covers both activities. 26 Week 10 Most RAD tools now support the development of web applications. The new version of Visual Basic – VB.Net – is a good example. Wikipedia: http://en.wikipedia.org/wiki/Rapid_application_development VB6 vs VB.Net As the name suggests, VB.Net is a .Net implementation of the VB language. The move from VB6 to VB.Net brought with it a move to a full OO language. With VB.Net, VB became a “first class citizen” in the Microsoft development world. Unfortunately, the added complexity of a full OO language made the transition to VB.Net difficult. Microsoft has been working hard with releases of Visual Studio (VS.Net) to simplify the task of developing .Net applications. The goal has been to return the level of complexity to that of VB6. Few would claim that this goal has been achieved. Wikipedia on VB: http://en.wikipedia.org/wiki/Visual_basic Wikipedia on VB.Net: http://en.wikipedia.org/wiki/VB.NET Why not VBA.Net? If you used Microsoft Access in Assignment 2, the scripting language used to develop event-handlers was Visual Basic for Applications (VBA). VBA is based on VB6. The added complexity of moving to a full OO language has delayed any replacement for VBA. Wikipedia reports that the intended replacement for VBA, called Visual Studio for Applications (VSA) was “deprecated in version 2.0 of the .NET Framework, leaving no clear upgrade path”. Wikipedia on VBA: http://en.wikipedia.org/wiki/Visual_Basic_for_Applications C#.Net By the time .Net was released (2002), a number of VB6 developers had migrated to Java. With the release of .Net, Microsoft released a direct competitor to Java – a C-based programming language called C#. With the complexity of moving from VB6 to VB.Net, and the similarity of C# to Java, many VB programmers moving to .Net decided to adopt C#. To investigate the relative fortunes of C# and VB.Net, enter the search term “C#” at ITJobsWatch (http://www.itjobswatch.co.uk/), and then repeat the exercise for the search term “VB”. Further reading Wikipedia on 4GLs: http://en.wikipedia.org/wiki/4GL 27 Week 10 Technology: Object-oriented programming (OOP) You are probably aware of the significance of object-oriented programming (OOP). The impact on database application development has been profound. Most popular programming languages today are OO languages. Even languages that are not full OO languages (not supporting inheritance, perhaps) are capable of using OO class libraries. Note: VBA is a good example. It is possible to create a class in VBA, but the language does not support inheritance. Some impacts of OOP on database application development have been evident in the preceding discussion: the move from ODBC to ADO and OLEDB the move from VB6 to VB.Net Impedance mismatch With the strong interest in OOP, it will come as no surprise that native object-oriented DBMS have emerged. Indeed, there is an impedance mismatch when an OOP program uses a relational database. For example, an OO representation of projects described in our assignment database would be a Project object that holds a collection of Service objects, which each hold a collection of Activity objects. An OOP program handling rows in tables seems unnatural. Wikipedia on the object-relational impedance mismatch: http://en.wikipedia.org/wiki/Object-relational_impedance_mismatch Object-oriented DBMS (OODBMS) In response to the impedance mismatch problem, a number of OO database products have emerged. However, as explained in the following Wikipedia article, “Object databases have been considered since the early 1980s and 1990s but they have made little impact on mainstream commercial data processing” Wikipedia: http://en.wikipedia.org/wiki/Object_databases Elaborated in the Wikipedia article, some reasons for the low impact of OODBMS are: the existing huge investment in relational systems procedural (or “navigational”) programming methods poor general-purpose query support Object-relational DBMS (ORDBMS) The benefits of type inheritance realised in OO systems has motivated similar features being added to relational DBMS. The SQL:1999 28 Week 10 standard introduced support for inheritance across user-defined structured types, and typed tables. A structured type is one that has more than one attribute. An example would be an Address type holding the attributes of an address. Once defined, the Address type can be used in the definition of columns in the database. Inheritance across simple and structured user-defined types is supported. A typed table is a table whose structure is defined by a user-defined structured type. Each row in the table will hold one instance of the structured type. Any DBMS that supports this aspect of the SQL:1999 standard might be considered an ORDBMS. Wikipedia on ORDBMS: http://en.wikipedia.org/wiki/Object_relational_database Further reading: Interested students can learn more on this topic from the pages of the book Advanced SQL:1999, available on Google books. Follow the link below, type a string (“user-defined types”, “structured types”, or “typed tables”) into the “Search in this book” field, then click Go: http://books.google.com.au/books?id=TiS9ZkfRdnYC SQL Server support: SQL Server does not yet support typed tables, but does supports inheritance across structured user-defined types; but, they must be developed in a .Net language. See: http://technet.microsoft.com/en-us/library/ms131120.aspx Object-relational mapping (ORM) More evidence of the impact of OOP comes from the number of objectrelational mapping (ORM) initiatives. This involves generating an application-specific OO interface to a relational database. Note: ADO provides an OO interface to a relational database, but it is a general-purpose interface. The ADO interface to our assignment database involves use of connection, command, and recordset objects. It does not provide access via a Project object holding a collection of Service objects. ORM products are designed to generate such interfaces. Wikipedia on ORM: http://en.wikipedia.org/wiki/Objectrelational_mapping Microsoft’s ORM offering is called ADO.Net Entity Framework: http://en.wikipedia.org/wiki/ADO.NET_Entity_Framework Further reading Wikipedia on object-oriented programming: http://en.wikipedia.org/wiki/Object-oriented_programming 29 Week 10 Technology: Extensible Markup Language (XML) It is likely that you have met XML before. XML is the data interchange format of choice on the Internet. However, the wide support and versatility of XML has seen it emerge as a significant general-purpose format for structured data. XML significance XML enables self-describing, structured data to be expressed in text documents that can be passed across the Internet, and through firewalls. Like relational databases, XML documents are self-describing. This is important. The self-describing nature of relational databases is exploited to expose the structure of a database to its users; it is available through any query language – custom code is not required. Likewise, standard software tools can peer inside an XML document to determine its structure. One point of difference with relational databases is that the XML file structure is defined by an open standard. An Oracle driver/provider is required to access data in an Oracle database. In contrast, any text editor can read an XML file. A wide range of tools are available to manage (design, create, edit, validate, query and transform) XML documents. XML documents do not need to carry an embedded self-description. Instead, a document can refer to a description of its structure located elsewhere. Standards for documents used by communities of interest (government, commercial, health, social, etc) can be published to enable validation of documents by consuming programs. XML support in relational DBMS Relation DBMS (RDBMS) have not been slow to add support for XML. Most popular RDBMS support the storage of XML documents in a column, optionally strongly typed by an XML schema. The contents of a column holding an XML document can be queried, and possibly indexed. The contents of a table can be serialised into an XML document. And the contents of XML document can be shredded into the columns of a table. Support for XML was added to the SQL standard in SQL:2003 and extended in SQL:2006, as described in the Wikipedia article on SQL: http://en.wikipedia.org/wiki/Sql Native XML DBMS (NXD) With the wide use of XML documents, it was only a matter of time before the idea of a native XML DBMS would arise. Although in their infancy, it will be interesting to see if NXDs will be able to challenge RDBMS for persistence storage supremacy. With the existing investment in relational 30 Week 10 systems, and strong support for XML already added to RDBMS, this seems unlikely. However, XML is a very significant technology. Wikipedia on XML databases: http://en.wikipedia.org/wiki/XML_databases Further reading XML is one of the most important data-oriented technologies of our time. The importance of XML is reflected in the introduction to the topic provided in the textbook, from page 450 to 478. Wikipedia also has an excellent article on the topic: http://en.wikipedia.org/wiki/XML Active influences We close this module with some consideration of active influences on contemporary database application development. The module has provided a rough sketch of database application development over the last 40 years. So, what’s next? What influences are currently driving change? What other influences will change the way we develop database applications in the future? Reflection on these questions is encouraged. The Internet Probably the most significant influence driving database application development today is the Internet. We have only begun to tap the potential for business automation on this network. XML and web services will play a big role in this transformation. Also, most contemporary web applications are limited in functionality. Compared with native PC applications, web applications have the feel of old, clunky mainframe applications. A number of initiatives are underway to improve the fidelity, or richness, of web applications. Another emerging architecture for database applications is cloud computing. With this approach, the database server sits as a service somewhere in a cloud of Internet services. Many details of a cloud service are hidden to the application – location of servers, number of servers, location of data, distribution of data, replication of data, location of backups, capacity of servers, etc. The idea is that the service is highly flexible, enabling the application to rapidly grow or shrink, while only incurring costs for services consumed. You will learn more about these influences in the next module. 31 Week 10 Open Source Software Another significant influence on contemporary database application development is open source software. Companies like IBM, Oracle and Microsoft have been threatened by this phenomenon. IBM has responded by growing its consulting business. Oracle has been busy acquiring companies to diversify its business. What has Microsoft been doing? Exercise: See if you can formulate an answer to this question by reflecting on initiatives taken by Microsoft in recent years. You may wish to use the mailing list to discuss this question with other students. In terms of database application development, some impacts are obvious: MySQL is a prominent open source DBMS: http://en.wikipedia.org/wiki/My_sql PostgreSQL is a less prominent, but more mature, open source DBMS: http://en.wikipedia.org/wiki/Postgres Java is now an open source language: http://en.wikipedia.org/wiki/Java_(software_platform) Eclipse is a prominent open source IDE: http://www.eclipse.org/ Mono is an open source implementation of .Net: http://www.monoproject.com/Main_Page Linux is a popular open source server operating system: http://en.wikipedia.org/wiki/Linux What other open source software projects are having an impact on database application development? The following Wikipedia article may help you formulate an answer to this question: http://en.wikipedia.org/wiki/Open_source Other influences? What other influences are driving database application development today? And what influences will drive database application development in the future? Will any of the following have an impact? XML databases: http://en.wikipedia.org/wiki/XML_databases declarative programming: http://en.wikipedia.org/wiki/Declarative_programming multi-paradigm programming: http://en.wikipedia.org/wiki/Programming_paradigm quantum computing: http://en.wikipedia.org/wiki/Quantum_computer HUI – the Human User Interface, perhaps? 32 Week 10 Summary: The database application landscape All of the technologies described in this module are still in use today. We mentioned earlier that the ITJobsWatch ranking of the term COBOL is dropping (the image below was taken on 09/09/09). However, the terms COBOL Analyst and COBOL Developer have risen in the last year. Indeed: COBOL apps with embedded SQL are still being modified to accommodate legal and taxation changes Access file-server apps are still being developed by users Access file-server apps are still being developed for users by professionals Access file-server apps are being converted into client-server apps lots of web apps are being developed lots of web service apps are being developed organisations are experimenting with cloud computing apps A reminder about Wikipedia articles Copied from the Week 1 Study Guide, please note the following reminder about the use of Wikipedia articles in this Study Guide. It should be recognised that Wikipedia articles cannot be relied on as an authoritative source of information on any topic. However, Wikipedia articles provide lots of useful information about many topics, particular IT topics. Most IT professionals use Wikipedia as a source of introductory informative about IT topics. Important: It is not necessary to read every word in every referenced Wikipedia article. Use Study Guide objectives to guide your reading. 33 Week 10 Review Objectives In preparation for the exam, review the leaning objectives identified at the start of this module. The exam for this course is open book. You can take your own notes, an annotated Study Guide, and printed materials into the exam. Prepare for the exam now by making any notes that will help demonstrate you can satisfy the leaning objectives identified at the start of this module. 34