Application Design

advertisement
Database Administration:
The Complete Guide to Practices and Procedures
Chapter 5
Application Design
Agenda
•
•
•
•
•
Database Application Development & SQL
Defining Transactions
Locking
Batch Processing
Questions
Database Application Development
and SQL
To properly design an application that relies on databases
for persistent data storage, the system designer at a
minimum will need to understand the following issues:
• How data is stored in a relational database
• How to code SQL statements to access and modify data in the
database
• How SQL differs from traditional programming languages
• How to embed SQL statements into a host programming
language
• How to optimize database access by changing SQL and
indexes
• Programming methods to avoid potential database processing
problems
SQL
• SQL is the de facto standard for accessing
relational databases
• SQL is a high-level language that provides a
greater degree of abstraction than do
traditional procedural languages
• SQL is designed such that programmers
specify what data is needed.
– It does not—indeed it cannot—specify how to
retrieve it
SQL: English-like
• SQL can be used to retrieve data easily with an
English-like syntax.
• It is easier to understand this:
• Than it is to understand C, Java, or most
typical programming languages.
Set-a-Time Processing
• SQL operates on sets
• Multiple rows can be retrieved, modified, or
removed in one fell swoop by using a single
SQL statement
• Every operation performed on a relational
database operates on a table (or set of tables)
and results in another table
• This is called relational closure
Relational Closure
Result Set (table)
SQL
Statement
Database Tables
Embedding SQL in a Program
• Host language code
– COBOL, FORTRAN, Assembler, etc.
– C/C++, Java, PHP, Visual Basic, etc.
• API
– ODBC, JDBC
• IDE
– Integrated Development Environment
• Code generators
SQL Middleware and APIs
• ODBC
– ODBC is a call level interface, or CLI
– Instead of directly embedding SQL in the program,
ODBC uses callable routines.
•
•
•
•
•
•
to allocate and deallocate resources
control connections to the database
execute SQL statements
obtain diagnostic information
control transaction termination
obtain information about the implementation
SQL Middleware and APIs
• JDBC
– JDBC enables Java to access relational databases.
– Similar to ODBC, JDBC consists of a set of classes and
interfaces that can be used to access relational data.
– There are several types of JDBC middleware, including
the JDBC-to-ODBC bridge, as well as direct JDBC
connectivity to the relational database.
– Anyone familiar with application programming and
ODBC (or any call-level interface) can get up and
running with JDBC quickly
Drivers
• ODBC and JDBC rely on drivers
– A driver provides an optimized interface for a
particular DBMS implementation
• Programs can make use of the drivers to
communicate with any JDBC- or ODBC-compliant
database.
• The drivers enable a standard set of SQL
statements in any Windows application to be
translated into commands recognized by a
remote SQL-compliant database.
• There are multiple types of JDBC drivers
SQL Middleware and APIs
• SQLJ
– SQLJ enables developers to embed SQL
statements in Java programs.
– A precompiler translates the embedded SQL into
Java code.
– The Java program is then compiled into bytecodes,
and a database bind operation creates packaged
access routines for the SQL.
SQL Middleware and APIs
• OLE DB
– Object Linking and Embedding Database
– OLE DB presents an object-oriented interface for generic
data access.
– Based on the COM architecture
– OLE DB provides greater flexibility than ODBC because it
can be used to access both relational and nonrelational
data.
– OLE DB is conceptually divided into consumers and
providers.
• consumers are the applications that need access to the data
• providers are the software components that implement the
interface and thereby provide the data to the consumer.
Application Infrastructure
• Application infrastructure is the combined
hardware and software environment that
supports and enables the application.
• The application infrastructure will vary from
organization to organization, and even from
application to application within an
organization.
Application Infrastructure
Mainframe
•
•
•
•
•
IBM z Series hardware
Running z/OS, DB2, CICS, with
application programs written in
COBOL.
Typically, applications consist of
both batch and online workload.
A modern mainframe infrastructure
adds interfaces to non-mainframe
clients, as well as WebSphere
Application Server and Java
programs.
Most new mainframe development
uses IDEs to code modern
applications instead of relying upon
COBOL programmers.
Distributed
• Most modern, distributed,
non-mainframe application
development projects
typically rely upon
application development
frameworks.
• The two most commonlyused frameworks are
Microsoft .NET and J2EE.
Microsoft .NET
• ... is a set of Microsoft technologies for
connecting people, systems, and devices
• ... allows Internet Servers to expose functions
to any client named as .NET web services
• … enables software to be delivered as a
service over the web
• … is designed to let many different services
and systems interact
Microsoft .NET Framework
Visual C#
Visual
Basic
Visual J#
Visual
C++
JScript
Third
Party
Microsoft .NET Framework
ADO.NET
ASP.NET
.NET Framework Class Library
CLR (Common Language Runtime)
User Interfaces
Java Alphabet Soup
• J2EE - Java 2 Enterprise Edition
– Standard services and specifications for making
Java highly available, secure, reliable, and scalable
for enterprise adoption
• EJB - Enterprise Java Beans
– Components that contain the business logic for a
J2EE application
J2EE and Java
Client Tier
Java
Standalone
Runtime
Java
Application
Web Tier
Business Tier
JSP
Pages
Enterprise
JavaBeans
EIS Tier
Database
Browser
Pure HTML
Servlets
Applet
Business
Components
for Java
Impact of Java on DBA
• Application tuning
– Must understand Java
• To provide guidance during design reviews
– Is the problem in the SQL or the application
• How can you tune the application if you do not
understand the language (Java)?
– Optimizing SQL is not enough since it may be embedded in
poor application code
– Must understand the SQL techniques used
• JDBC and SQLJ
Java versus .Net
• ...designed to enable
applications to be
deployed on any
platform as long as they
are written in Java
• …designed to enable
development in
multiple languages as
long as the application
is deployed on
Windows
Other Application Choices
• There are other choices, including
– Ruby on Rails
– Ajax
– PHP
– C/C++
– And so on…
• This is not an exhaustive list…
Object Orientation
• OO programming advantages:
– faster program development time
– reduced maintenance costs
– resulting in a better ROI
• Piecing together reusable objects and defining
new objects based on similar object classes
can dramatically reduce development time
and costs.
OO, SQL and Databases
• OO and relational databases are not
inherently compatible
• The set-based nature of SQL is anathema to
the OO techniques practiced by Java and C++
developers.
• All too often insufficient consideration has
been given to the manner in which data is
accessed, resulting in poor design and faulty
performance
Impedance Mismatch
• When OO programming language is used to
access a relational database, you must map
objects to relations.
– OO programs deal with objects
– RDBMSs deal with relations, (that is, tables)
• Applications will not be object-oriented in the
“true” sense of the word because the data will
not be encapsulated within the method (that
is, the program).
Making OO Programs Work with
Relational Databases
• Serialization
– Saving data using a flat file representation of the
object. This approach can be slow and difficult to use
across applications.
• XML
– can be stored natively in many relational database
systems. But XML adds a layer of complexity and
requires an additional programming skillset.
• Object-Relational Mapping (ORM)
– Most common approach
Object Relational Mapping
• With ORM an object’s attributes are stored in one or
more columns of a relational table. Hibernate is a
popular ORM library for Java; NHibernate is an
adaptation of Hibernate for the .NET framework.
• Both Hibernate and NHibernate provide capabilities for
mapping objects to a relational database by replacing
direct persistence-related database accesses with highlevel object handling functions.
• Another option is Microsoft LINQ, which stands for
Language Integrated Query. LINQ provides a set of .NET
framework and language extensions for objectrelational mapping.
Types of SQL
• SQL can be planned or unplanned.
– A planned SQL request is typically embedded into an application program, but
it might also exist in a query or reporting tool. At any rate, a planned SQL
request is designed and tested for accuracy and efficiency before it is run in a
production system. Contrast this with the characteristics of an unplanned SQL
request. Unplanned SQL, also called ad hoc SQL, is created “on the fly” by end
users during the course of business. Most ad hoc queries are created to
examine data for patterns and trends that impact business. Unplanned, ad hoc
SQL requests can be a significant source of inefficiency and are difficult to
tune. How do you tune requests that are constantly written, rewritten, and
changed?
• SQL can either be embedded in a program or issued stand-alone.
– Embedded SQL is contained within an application program, whereas standalone SQL is run by itself or within a query, reporting, or OLAP tool.
• SQL can be dynamic or static.
– A dynamic SQL statement is optimized at run time. Depending on the DBMS, a
dynamic SQL statement may also be changed at run time. Static SQL, on the
other hand, is optimized prior to execution and cannot change without
reprogramming. Favor static SQL to minimize the possibility of SQL injection
attacks.
SQL Usage Considerations
Situation
Execution type
Program requirement
Dynamism
Planned
Embedded
Dynamic
SQL formulation does not change.
Planned
Embedded
Static
Highly concurrent, high-performance
Planned
Embedded
Dynamic or static
Unplanned
Stand-alone
Dynamic
Planned
Embedded or stand-alone
Dynamic or static
Unplanned
Embedded or stand-alone
Dynamic or static
Columns and predicates of the SQL statement
can change during execution.
transactions.
Ad hoc one-off queries.
Repeated analytical queries.
Quick one-time “fix” programs.
SQL Coding for Performance
• It is important to learn how to code SQL for
performance
• Generally a good idea to rely on the DBMS to
optimize the code
• Let SQL do the work instead of coding it in host
language program
– The less data brought from the DBMS to the program
the better performance will be
• More performance guidelines come later in the
course!
What is XML?
• XML stands for eXtensible
Markup Language.
– Like HTML, XML is based on SGML
– HTML uses tags to describe the
appearance of data on a page,
whereas XML uses tags to describe
the data itself, instead of its
appearance.
– Allows documents to be selfdescribing, through the
specification of tag sets and the
structural relationships between
the tags.
• XML is actually a meta
language (a language used to
define other languages).
– These languages are collected
in dictionaries called DTDs
Document Type Definitions.
– The DTD stores definitions of
tags for specific industries or
fields of knowledge.
– The DTD for an XML document
can be either part of the
document or stored in an
external file.
XML Data
• XML uses tags to describe the data itself
<CUSTOMER>
<first_name>Craig</first_name>
<middle_initial>S.</middle_initial>
<last_name>Mullins</last_name>
<company_name>Mullins Consulting, Inc.</company_name>
<street_address>15 Coventry Ct.</street_address>
<city>Sugar Land</city>
<state>TX</state>
<zip_code>77479</zip_code>
<country>USA</country>
</CUSTOMER>
http://www.xml.org
Querying XML
• XQUERY
– FLWOR
• FOR, LET, WHERE, ORDER BY, and RETURN.
– Not just for querying, it also allows for new XML
documents to be constructed
• SQL/XML
– Uses functions to access XML data
• XMLDOCUMENT, XMLELEMENT, XMLCONCAT, XMLAGG,
XMLQUERY, XMLTABLE
Defining Transactions
• A transaction is an atomic unit of work with
respect to recovery and consistency.
• When all the steps that make up a specific
transaction have been accomplished, a
COMMIT is issued.
– ROLLBACK before COMMIT to undo transaction’s
work
• DBMS maintains transaction log
ACID Properties of Transactions
• Defining Transactions
– Atomicity
– Consistency
– Isolation
– Durability
• Unit of Work
– Ensure proper definition and coding
Unit of Work
• A UOW is a series of instructions and
messages that guarantees data integrity.
• Example: bank transaction
– Withdrawal of $20
– The transaction must involve both the subtraction
of $20 from your account and the delivery of $20
to you
– Only doing one or the other is not a complete unit
of work
TP System Versus DBMS (Stored Procs)
Presentation
(Client)
Presentation
(Client)
Workflow
Controller
Stored
Procedures
Relational
DBMS (2)
Transaction
Server
Relational
DBMS (1)
Disk
Disk
Relational
DBMS
Disk
Application Servers
• An application server combines the features of
a transaction server with additional
functionality to assist in building, managing,
and distributing database applications.
• Examples:
– WebSphere (IBM)
– Zend Server
– Base4 Application Server (open source)
Transactions and Locking
• The DBMS uses a locking mechanism to enable
multiple, concurrent users to access and modify data in
the database.
• By using locks, the DBMS automatically guarantees the
integrity of data. The DBMS locking strategies permit
multiple users from multiple environments to access
and modify data in the database at the same time.
• Locking Granularity
–
–
–
–
–
Row
Page (or Block)
Table
Table Space
Database
Level of Lock Granularity
Access Concurrency
High
Low
Granularity of Lock
Column
Row
Page
Table
Tablespace
Database
Types of Locks
• The following types of locks can be taken on database pages or
rows:
– Shared Lock
• Taken when data is read with no intent to update it.
• If a shared lock has been taken on a row, page, or table, other processes or
users are permitted to read the same data.
– eXclusive Lock
• Taken when data is modified.
• If an exclusive lock has been taken on a row, page, or table, other processes or
users are generally not permitted to read or modify the same data.
– Update Lock
• Taken when data must first be read before it is changed or deleted.
• The update lock indicates that the data may be modified or deleted in the
future.
• If the data is actually modified or deleted, the DBMS will promote the update
lock to an exclusive lock.
Intent Locks
• Intent locks are placed on higher-level
database objects when a user or process takes
locks on the data pages or rows.
– Table or Table Space
• An intent lock stays in place for the life of the
lower-level locks.
Lock Compatibility
Lock Timeouts
error
Deadlocks
Process A
.
.
.
Request row 3
.
.
.
.
Request row 7
Process B
Table X
data…data…data...
lock
Process A is waiting on Process B
data…data…data...
lock
.
.
.
Request row 7
.
.
.
Request row 3
Process B is waiting on Process A
Lock Duration
• Lock duration refers to the length of time that
a lock is held by the DBMS.
• Two parameters impact lock duration:
– Isolation level
– Acquire/Release
Isolation Level
• Read uncommitted
– aka dirty read
• Read committed
– aka cursor stability
• Repeatable read
• Serializable
Acquire/Release Specification
• Controls when Intent locks are acquired and
released
– Intent locks can be acquired either immediately
when the transaction is requested or iteratively as
needed while the transaction executes.
– Intent locks can be released when the transaction
completes or when each intent lock is no longer
required for a unit of work.
Lock Escalation
• Lock escalation is the process of increasing the
lock granularity for a process or program.
• Typically controlled by system parameters and
DDL parameters in CREATE statements.
• For example:
– If a threshold is hit for the number of locks being held
by a process (or by the entire DBMS), page locks (or
row locks) can be escalated to table locks.
– Can cause concurrency issues
• If the entire table is locked other processes cannot access
the data
Programming Techniques to Minimize
Locking Problems
• Avoid deadlocks by coding updates in the same
sequence regardless of program
– For example, alphabetical order by table name
• Issue data modification SQL statements as close
to the end of the UOW as possible
– The later in the UOW the update occurs, the
shorter the duration of the lock
Batch Processing
• Batch Processing
– Where programs are scheduled to run at predetermined
times without any user input
• Batch programmers sometimes tend to treat tables
like flat file… that is NOT a good idea.
– Think relationally instead of file processing
• Plan and implement a COMMIT strategy
in all batch application programs
– Instead of holding locks until the end of the program
– Otherwise you will experience a lot of lock timeouts
Questions
Download