Database Administration: The Complete Guide to Practices and Procedures Chapter 5 Application Design Agenda • • • • • Database Application Development & SQL Defining Transactions Locking Batch Processing Questions Database Application Development and SQL To properly design an application that relies on databases for persistent data storage, the system designer at a minimum will need to understand the following issues: • How data is stored in a relational database • How to code SQL statements to access and modify data in the database • How SQL differs from traditional programming languages • How to embed SQL statements into a host programming language • How to optimize database access by changing SQL and indexes • Programming methods to avoid potential database processing problems SQL • SQL is the de facto standard for accessing relational databases • SQL is a high-level language that provides a greater degree of abstraction than do traditional procedural languages • SQL is designed such that programmers specify what data is needed. – It does not—indeed it cannot—specify how to retrieve it SQL: English-like • SQL can be used to retrieve data easily with an English-like syntax. • It is easier to understand this: • Than it is to understand C, Java, or most typical programming languages. Set-a-Time Processing • SQL operates on sets • Multiple rows can be retrieved, modified, or removed in one fell swoop by using a single SQL statement • Every operation performed on a relational database operates on a table (or set of tables) and results in another table • This is called relational closure Relational Closure Result Set (table) SQL Statement Database Tables Embedding SQL in a Program • Host language code – COBOL, FORTRAN, Assembler, etc. – C/C++, Java, PHP, Visual Basic, etc. • API – ODBC, JDBC • IDE – Integrated Development Environment • Code generators SQL Middleware and APIs • ODBC – ODBC is a call level interface, or CLI – Instead of directly embedding SQL in the program, ODBC uses callable routines. • • • • • • to allocate and deallocate resources control connections to the database execute SQL statements obtain diagnostic information control transaction termination obtain information about the implementation SQL Middleware and APIs • JDBC – JDBC enables Java to access relational databases. – Similar to ODBC, JDBC consists of a set of classes and interfaces that can be used to access relational data. – There are several types of JDBC middleware, including the JDBC-to-ODBC bridge, as well as direct JDBC connectivity to the relational database. – Anyone familiar with application programming and ODBC (or any call-level interface) can get up and running with JDBC quickly Drivers • ODBC and JDBC rely on drivers – A driver provides an optimized interface for a particular DBMS implementation • Programs can make use of the drivers to communicate with any JDBC- or ODBC-compliant database. • The drivers enable a standard set of SQL statements in any Windows application to be translated into commands recognized by a remote SQL-compliant database. • There are multiple types of JDBC drivers SQL Middleware and APIs • SQLJ – SQLJ enables developers to embed SQL statements in Java programs. – A precompiler translates the embedded SQL into Java code. – The Java program is then compiled into bytecodes, and a database bind operation creates packaged access routines for the SQL. SQL Middleware and APIs • OLE DB – Object Linking and Embedding Database – OLE DB presents an object-oriented interface for generic data access. – Based on the COM architecture – OLE DB provides greater flexibility than ODBC because it can be used to access both relational and nonrelational data. – OLE DB is conceptually divided into consumers and providers. • consumers are the applications that need access to the data • providers are the software components that implement the interface and thereby provide the data to the consumer. Application Infrastructure • Application infrastructure is the combined hardware and software environment that supports and enables the application. • The application infrastructure will vary from organization to organization, and even from application to application within an organization. Application Infrastructure Mainframe • • • • • IBM z Series hardware Running z/OS, DB2, CICS, with application programs written in COBOL. Typically, applications consist of both batch and online workload. A modern mainframe infrastructure adds interfaces to non-mainframe clients, as well as WebSphere Application Server and Java programs. Most new mainframe development uses IDEs to code modern applications instead of relying upon COBOL programmers. Distributed • Most modern, distributed, non-mainframe application development projects typically rely upon application development frameworks. • The two most commonlyused frameworks are Microsoft .NET and J2EE. Microsoft .NET • ... is a set of Microsoft technologies for connecting people, systems, and devices • ... allows Internet Servers to expose functions to any client named as .NET web services • … enables software to be delivered as a service over the web • … is designed to let many different services and systems interact Microsoft .NET Framework Visual C# Visual Basic Visual J# Visual C++ JScript Third Party Microsoft .NET Framework ADO.NET ASP.NET .NET Framework Class Library CLR (Common Language Runtime) User Interfaces Java Alphabet Soup • J2EE - Java 2 Enterprise Edition – Standard services and specifications for making Java highly available, secure, reliable, and scalable for enterprise adoption • EJB - Enterprise Java Beans – Components that contain the business logic for a J2EE application J2EE and Java Client Tier Java Standalone Runtime Java Application Web Tier Business Tier JSP Pages Enterprise JavaBeans EIS Tier Database Browser Pure HTML Servlets Applet Business Components for Java Impact of Java on DBA • Application tuning – Must understand Java • To provide guidance during design reviews – Is the problem in the SQL or the application • How can you tune the application if you do not understand the language (Java)? – Optimizing SQL is not enough since it may be embedded in poor application code – Must understand the SQL techniques used • JDBC and SQLJ Java versus .Net • ...designed to enable applications to be deployed on any platform as long as they are written in Java • …designed to enable development in multiple languages as long as the application is deployed on Windows Other Application Choices • There are other choices, including – Ruby on Rails – Ajax – PHP – C/C++ – And so on… • This is not an exhaustive list… Object Orientation • OO programming advantages: – faster program development time – reduced maintenance costs – resulting in a better ROI • Piecing together reusable objects and defining new objects based on similar object classes can dramatically reduce development time and costs. OO, SQL and Databases • OO and relational databases are not inherently compatible • The set-based nature of SQL is anathema to the OO techniques practiced by Java and C++ developers. • All too often insufficient consideration has been given to the manner in which data is accessed, resulting in poor design and faulty performance Impedance Mismatch • When OO programming language is used to access a relational database, you must map objects to relations. – OO programs deal with objects – RDBMSs deal with relations, (that is, tables) • Applications will not be object-oriented in the “true” sense of the word because the data will not be encapsulated within the method (that is, the program). Making OO Programs Work with Relational Databases • Serialization – Saving data using a flat file representation of the object. This approach can be slow and difficult to use across applications. • XML – can be stored natively in many relational database systems. But XML adds a layer of complexity and requires an additional programming skillset. • Object-Relational Mapping (ORM) – Most common approach Object Relational Mapping • With ORM an object’s attributes are stored in one or more columns of a relational table. Hibernate is a popular ORM library for Java; NHibernate is an adaptation of Hibernate for the .NET framework. • Both Hibernate and NHibernate provide capabilities for mapping objects to a relational database by replacing direct persistence-related database accesses with highlevel object handling functions. • Another option is Microsoft LINQ, which stands for Language Integrated Query. LINQ provides a set of .NET framework and language extensions for objectrelational mapping. Types of SQL • SQL can be planned or unplanned. – A planned SQL request is typically embedded into an application program, but it might also exist in a query or reporting tool. At any rate, a planned SQL request is designed and tested for accuracy and efficiency before it is run in a production system. Contrast this with the characteristics of an unplanned SQL request. Unplanned SQL, also called ad hoc SQL, is created “on the fly” by end users during the course of business. Most ad hoc queries are created to examine data for patterns and trends that impact business. Unplanned, ad hoc SQL requests can be a significant source of inefficiency and are difficult to tune. How do you tune requests that are constantly written, rewritten, and changed? • SQL can either be embedded in a program or issued stand-alone. – Embedded SQL is contained within an application program, whereas standalone SQL is run by itself or within a query, reporting, or OLAP tool. • SQL can be dynamic or static. – A dynamic SQL statement is optimized at run time. Depending on the DBMS, a dynamic SQL statement may also be changed at run time. Static SQL, on the other hand, is optimized prior to execution and cannot change without reprogramming. Favor static SQL to minimize the possibility of SQL injection attacks. SQL Usage Considerations Situation Execution type Program requirement Dynamism Planned Embedded Dynamic SQL formulation does not change. Planned Embedded Static Highly concurrent, high-performance Planned Embedded Dynamic or static Unplanned Stand-alone Dynamic Planned Embedded or stand-alone Dynamic or static Unplanned Embedded or stand-alone Dynamic or static Columns and predicates of the SQL statement can change during execution. transactions. Ad hoc one-off queries. Repeated analytical queries. Quick one-time “fix” programs. SQL Coding for Performance • It is important to learn how to code SQL for performance • Generally a good idea to rely on the DBMS to optimize the code • Let SQL do the work instead of coding it in host language program – The less data brought from the DBMS to the program the better performance will be • More performance guidelines come later in the course! What is XML? • XML stands for eXtensible Markup Language. – Like HTML, XML is based on SGML – HTML uses tags to describe the appearance of data on a page, whereas XML uses tags to describe the data itself, instead of its appearance. – Allows documents to be selfdescribing, through the specification of tag sets and the structural relationships between the tags. • XML is actually a meta language (a language used to define other languages). – These languages are collected in dictionaries called DTDs Document Type Definitions. – The DTD stores definitions of tags for specific industries or fields of knowledge. – The DTD for an XML document can be either part of the document or stored in an external file. XML Data • XML uses tags to describe the data itself <CUSTOMER> <first_name>Craig</first_name> <middle_initial>S.</middle_initial> <last_name>Mullins</last_name> <company_name>Mullins Consulting, Inc.</company_name> <street_address>15 Coventry Ct.</street_address> <city>Sugar Land</city> <state>TX</state> <zip_code>77479</zip_code> <country>USA</country> </CUSTOMER> http://www.xml.org Querying XML • XQUERY – FLWOR • FOR, LET, WHERE, ORDER BY, and RETURN. – Not just for querying, it also allows for new XML documents to be constructed • SQL/XML – Uses functions to access XML data • XMLDOCUMENT, XMLELEMENT, XMLCONCAT, XMLAGG, XMLQUERY, XMLTABLE Defining Transactions • A transaction is an atomic unit of work with respect to recovery and consistency. • When all the steps that make up a specific transaction have been accomplished, a COMMIT is issued. – ROLLBACK before COMMIT to undo transaction’s work • DBMS maintains transaction log ACID Properties of Transactions • Defining Transactions – Atomicity – Consistency – Isolation – Durability • Unit of Work – Ensure proper definition and coding Unit of Work • A UOW is a series of instructions and messages that guarantees data integrity. • Example: bank transaction – Withdrawal of $20 – The transaction must involve both the subtraction of $20 from your account and the delivery of $20 to you – Only doing one or the other is not a complete unit of work TP System Versus DBMS (Stored Procs) Presentation (Client) Presentation (Client) Workflow Controller Stored Procedures Relational DBMS (2) Transaction Server Relational DBMS (1) Disk Disk Relational DBMS Disk Application Servers • An application server combines the features of a transaction server with additional functionality to assist in building, managing, and distributing database applications. • Examples: – WebSphere (IBM) – Zend Server – Base4 Application Server (open source) Transactions and Locking • The DBMS uses a locking mechanism to enable multiple, concurrent users to access and modify data in the database. • By using locks, the DBMS automatically guarantees the integrity of data. The DBMS locking strategies permit multiple users from multiple environments to access and modify data in the database at the same time. • Locking Granularity – – – – – Row Page (or Block) Table Table Space Database Level of Lock Granularity Access Concurrency High Low Granularity of Lock Column Row Page Table Tablespace Database Types of Locks • The following types of locks can be taken on database pages or rows: – Shared Lock • Taken when data is read with no intent to update it. • If a shared lock has been taken on a row, page, or table, other processes or users are permitted to read the same data. – eXclusive Lock • Taken when data is modified. • If an exclusive lock has been taken on a row, page, or table, other processes or users are generally not permitted to read or modify the same data. – Update Lock • Taken when data must first be read before it is changed or deleted. • The update lock indicates that the data may be modified or deleted in the future. • If the data is actually modified or deleted, the DBMS will promote the update lock to an exclusive lock. Intent Locks • Intent locks are placed on higher-level database objects when a user or process takes locks on the data pages or rows. – Table or Table Space • An intent lock stays in place for the life of the lower-level locks. Lock Compatibility Lock Timeouts error Deadlocks Process A . . . Request row 3 . . . . Request row 7 Process B Table X data…data…data... lock Process A is waiting on Process B data…data…data... lock . . . Request row 7 . . . Request row 3 Process B is waiting on Process A Lock Duration • Lock duration refers to the length of time that a lock is held by the DBMS. • Two parameters impact lock duration: – Isolation level – Acquire/Release Isolation Level • Read uncommitted – aka dirty read • Read committed – aka cursor stability • Repeatable read • Serializable Acquire/Release Specification • Controls when Intent locks are acquired and released – Intent locks can be acquired either immediately when the transaction is requested or iteratively as needed while the transaction executes. – Intent locks can be released when the transaction completes or when each intent lock is no longer required for a unit of work. Lock Escalation • Lock escalation is the process of increasing the lock granularity for a process or program. • Typically controlled by system parameters and DDL parameters in CREATE statements. • For example: – If a threshold is hit for the number of locks being held by a process (or by the entire DBMS), page locks (or row locks) can be escalated to table locks. – Can cause concurrency issues • If the entire table is locked other processes cannot access the data Programming Techniques to Minimize Locking Problems • Avoid deadlocks by coding updates in the same sequence regardless of program – For example, alphabetical order by table name • Issue data modification SQL statements as close to the end of the UOW as possible – The later in the UOW the update occurs, the shorter the duration of the lock Batch Processing • Batch Processing – Where programs are scheduled to run at predetermined times without any user input • Batch programmers sometimes tend to treat tables like flat file… that is NOT a good idea. – Think relationally instead of file processing • Plan and implement a COMMIT strategy in all batch application programs – Instead of holding locks until the end of the program – Otherwise you will experience a lot of lock timeouts Questions