Content Request Markup Language (CRML): a Distributed

advertisement
Content Request Markup Language (CRML): a
Distributed Framework for XML-based Content
Publishing
Chi-Huang Chiu, Kai-Chih Liang, Shyan-Ming Yuan
Dept. of Computer & Information Science, National Chiao Tung University
1001 Ta-Hsueh Rd,
HsinChu 300, Taiwan
+886-952-729654
chchiu/kcliang/smyuan@cis.nctu.edu.tw
ABSTRACT
Construct web applications to provide dynamic, personalized
web contents with high scalability and performance is a
challenge to the software industry in the new Internet era. In
most available solutions, load balancing and caching
mechanisms are introduced in front of web servers to reduce
workload. In this paper we present Content Request Markup
Language (CRML), an enabling techniques for distributed
XML processing at the content level. CRML is a language
based on emerging XML standards, XSLT and XPATH, to
publish XML-based content over HTTP protocol. It provides
hints to construct a distributed framework to support parallel
XML-based content publishing. In addition, the content from
databases or other sources could be cached before or after
processing in block or page level. With the parallel content
publishing and the caching mechanism, the CRML could
provide a high performance platform for fully customized web
service.
Keywords
Load balancing, Caching, Personalization, XML
1. INTRODUCTION
The volume of Internet content was getting higher and higher in
the past few years. Meanwhile, dynamic multimedia contents
are becoming the emerging demands from users. Content
management, especially dynamic contents, is the issue that
every service providers must face with.
The CRML is a content publishing framework that enforces
distributed computation behind the Web server when managing
Internet content. There are three major features of the CRML.
Pure XML. CRML is the markup language derived from XML.
Contents are marked by CRML tags, which provides hints for
distributed processing of the content.
Parameter-triggered caching mechanism. To increase the
reusability of the cache data, CRML implementation utilize the
query parameter to further reuse cached query result with
different query parameters.
Parallel Processing. The implementation of CRML enforces
the parallel process behinds the Web server.
2. The Language Definition
In CRML, the dynamic content should be embedded in the
document prepared to return described by a CRB: Content
Request Block. Each CRB should represent a block of data and
contain the data source, inter-CRB communication, exception
handling, parallel processing hint, and caching informtion.
2.1 Content Request Block
<crml:content id="auth"
xmlns:crml="http://nctu/CRML">
<crml:concunrrency threadsafe="no"/>
<crml:source type="sql">
<crml:attr name="db" value="jdbc:db2:sample"/>
[!CDATA[select name from users where
id='@{id}' and password='@{PASSWD}']]
</crml:source>
<crml:cache key=”id,password”/>
<crml:variable name="name"
catch="RESULT/ROW[0]/id"/>
<crml:exception code="100" type="page">
<?xml-stylesheet href="error.xsl" type="xsl"?>
<error type="autherror">
The User/Password you inputted is not correct !
</error>
</crml:exception>
</crml:content>
<crml:content id="secdata"
xmlns:crml="http://nctu/CRML">
<crml:concunrrency threadsafe="no"
wait="auth"/>
<crml:source type="file">
<crml:attr name="url"
value="file://c/doc/dept.xml"/>
</crml:source>
</crml:content>
The above example shows two CRB in a document one of them
is contain the data from the database and the other is from a
local file. In the <crml:source> tag, the inner nodes will be
passed to the processor of specified data source.
Current implementation includes JDBC, RMI, CORBA, local
files, HTTP, and Java Source Code. No matter what type of data
source is used, the result type of the data source must be XML.
The final result could be process by a XSLT process if need.
2.2 Inter-CRB Communication
The inter-CRB communication is archived by the data exchange
via page-level variable. Like the <crml:variable> tag in the
previous code fragment, the XPATH string is used to extract
the information from the data source to specified variable.
2.3 Exception Handling
If any exception is thrown by the data source, the
<crml:exception> tag will catch it. If the type of the exception is
page, the inner node of the tag will replace the whole result
interrupt all other running or waiting CRB. Otherwise, it will
replace the result of the CRB.
2.4 Parallel Processing Hint
The <crml:concurrency> tag provide the relationship between
all CRB. The threadsafe attribute defines that if the CRB is
thread-safe and the wait attribute defines the CRBs which
should be executed before this CRB.
2.5 Page-Level Processing Instructions
In addtion to the CRBs embedded in the document, some pagelevel processing instructions are used to do the page-level
caching, transformation, and error handling.
3. Parameter-triggered Caching Mechanism
The caching mechanism in this framework is easy. The concept
is that the same parameter will get the same content. Each CRB
include a cache tag to describe the parameter affect the result
like the <crml:cache> tag in previous example. The CRB will
not be processed if the same parameter could be found in the
cache (Buffer Pool). Besides, to generate content with the
dynamic information, the invalidation is done to remove
expired content in the cache.
source file and be passed to the Engine with request
parameters. Second, the compiler will be invoked to compile
the CRML source file and generate the process plan of the
request. Third, the process plan will be sent to the runtime
and executed. Finally, if the result from the runtime contains
another CRBs to process, the compiler will try to generate a
process plan for next round and some page-level caching
will be applied here.
4.2 The Process Plan
The process plan generated by the compiler will seperate the
CRBs to several thread. The two CRB in the same thread
represent that the dependence of them. In other words, it is
useless if you execute the two CRBs concurrently because one
CRB may wait the other one.
4.3 The CRML Runtime
3.1 Self-Invalidation
Some content (such as HTTP and files) has the information
about the modification or expiration information, which could
be used to invalidate the cache.
3.2 Event-driven Invalidation
Like the above, the access plan will be dispatched to several
threads to execute. The CRB Executor is the core of each thread
that executes the CRBs by the help of CRB Processors and
Buffer Pools. The Result Collector could collect and manage
the results and exceptions from each thread. The semaphores in
the Result Collector also do the concurrency control. Such
runtime could be implemented on the distributed environment
easily if the centralized-controlled Result Collector is ready.
5. Conclusions and Future Works
The above figure is an example of the event-driven invalidation
scenario, which the event source is a DBMS. An event
generator could be set via the replication module on the DBMS
to publish the event. The buffer pool with an event listener
adapter could listen the events on the bus to invalidate the outof-date information.
3.3 Expression Variable
<crml:cache name="userquery" keys="~{#{min}/5}" />
The example shows a notation of Expression Variable, which
could enclose an expression as a variable. In such example, the
variable ~{#{min}/5} represent the value of the numbers of
minutes from 1970 divide 5. The variable has the same value in
the same block of 5 minutes and could reduce the effort of
processing. Besides, the name of the cached element could let
the CRBs in different CRML document share the same cache
data. The scheme is suitable for the CRB, which is accessed
frequently, but the real-time accuracy is not pretty important.
4. Implementation for Parallel Processing
The CRML doesn't include the detail information about how to
process a CRML. It only contains some requirement of partial
sequence and exclusion and is much depended on the
implementation.
4.1 The CRML Engine
The CRML Engine is the component to process a whole
request. First, the request will be associated with a CRML
In this paper, we have presented the core of CRML, a
distributed framework for XML-based Content Publishing.
Comparing with other publishing framework or tools, CRML: 1)
stresses the distributed architecture which could be used either
to do the load balancing for a heavy traffic website or to
perform multithread processing in a single machine to improve
performance; 2) provides the block-level and contentindependent cache for the data before processing which is a
great improvement for fully-customized web service; 3) uses the
emerging XML standards to process the XML-based data
without any sequential programming language.
CRML is a part of WebEngine, a developing project of Web
Content Management System. The concept of WCMS is like the
DBMS, which both manage the data and handle the request. As
the role of SQL in DBMS, the CRML is designed to describe
what user want to request not how to handle the request. Such
spirit could let WebEngine improve the performance using a lot
of techniques without change the original CRML source file.
Since the CRML is used to request the content, the transaction
management is not mentioned in the first version of design,
which may be improved as future works.
6. Acknowledgment
This work was supported by both the National Science Council
grants NSC88-2213-E-009-087, and NSC89-2213-E009-069.
Download