Uploaded by DAENG HAFIY

EN-Annex TS1.2 Software+Architecture+Document

advertisement
Publications Office
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Software Architecture Document
Subject
Software Architecture Document
Version / Status
1.00
Release Date
28/04/2010
Filename
TED-SAD-v1.00.doc
Document Reference
TED-SAD
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00
Table Of Contents
1
Introduction ....................................................................................................................................... 5
1.1
Purpose of the Document ...................................................................................................... 5
1.2
Scope of the Document.......................................................................................................... 5
1.3
Intended Audience ................................................................................................................. 5
2
Reference and Applicable Documents ............................................................................................. 6
3
Acronyms and Abbreviations ............................................................................................................ 7
4
Architectural Representation ............................................................................................................. 8
5
Logical View ...................................................................................................................................... 9
5.1
5.2
5.3
TED Website .......................................................................................................................... 9
5.1.1
Overview ........................................................................................................................ 9
5.1.2
Web Layer Design Package ........................................................................................ 11
5.1.3
Service Layer Design Package ................................................................................... 13
5.1.4
Domain layer ............................................................................................................... 14
5.1.5
Data access layer ........................................................................................................ 14
5.1.6
General Principles ....................................................................................................... 14
Monitoring data-warehouse.................................................................................................. 15
5.2.1
BIRT ............................................................................................................................ 15
5.2.2
Cacti ............................................................................................................................ 16
5.2.3
Webalizer ..................................................................................................................... 16
License Holder environment ................................................................................................ 17
5.3.1
5.4
Email analysis and notifications ........................................................................................... 18
5.5
Workflow engine ................................................................................................................... 18
5.6
6
Authentication and logging .......................................................................................... 18
5.5.1
Validation and files transformation .............................................................................. 19
5.5.2
PDF generation and time-stamping............................................................................. 19
5.5.3
Indexing ....................................................................................................................... 20
5.5.4
DVD image creation .................................................................................................... 20
5.5.5
Contracting authority notification ................................................................................. 20
Notice viewer ........................................................................................................................ 21
Implementation View ....................................................................................................................... 22
6.1
6.2
40643464
TED Website ........................................................................................................................ 22
6.1.1
Overview ...................................................................................................................... 22
6.1.2
TED XSL transformation ............................................................................................. 23
Email analysis and notifications ........................................................................................... 23
Page 2 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
6.3
Workflow engine ................................................................................................................... 24
6.3.1
The workflow engine package ..................................................................................... 24
6.3.2
The workflow engine Implementation .......................................................................... 24
6.3.3
The Indexing Implementation ...................................................................................... 25
6.3.4
The workflow Transformation ...................................................................................... 26
6.3.5
The workflow management tool .................................................................................. 27
Ted System i18n support ..................................................................................................... 27
6.5
Notice viewer ........................................................................................................................ 27
6.6
Reference data ..................................................................................................................... 28
6.6.1
Reference data: Deletion ............................................................................................. 29
6.6.2
Reference data: Addition ............................................................................................. 29
6.6.3
Reference data: Modification ....................................................................................... 29
Content modification ............................................................................................................ 30
6.7.1
Addition of a new form ................................................................................................. 30
6.7.2
Modification of reference data ..................................................................................... 30
6.8
Application Dependencies.................................................................................................... 32
6.9
Backup procedure ................................................................................................................ 33
6.9.1
Daily Back-end backup procedure .............................................................................. 33
6.9.2
Daily Front-end backup procedure .............................................................................. 34
6.9.3
Daily Data warehouse backup procedure ................................................................... 34
6.9.4
Daily Common backup procedure ............................................................................... 34
Data View ........................................................................................................................................ 35
7.1
MySQL cluster ...................................................................................................................... 35
7.2
Technical Columns ............................................................................................................... 36
7.2.1
8
Version: 1.00
6.4
6.7
7
Software Architecture Document
Audit segment .............................................................................................................. 36
Deployment view ............................................................................................................................. 37
8.1
8.2
8.3
40643464
Network File System server ................................................................................................. 39
8.1.1
TED repository file system .......................................................................................... 39
8.1.2
TED temporary backup file system ............................................................................. 40
8.1.3
TED mirror backup file system .................................................................................... 40
8.1.4
Windows XP via VMWare ........................................................................................... 40
James email Servers ............................................................................................................ 41
8.2.1
DNS Configuration....................................................................................................... 41
8.2.2
Spam folders ............................................................................................................... 41
Database Organisation ........................................................................................................ 41
Page 3 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00
LIST OF TABLES
Table 1: Reference Documents ............................................................................................................... 6
Table 2: Applicable Documents ............................................................................................................... 6
Table 3: TED XSL Transformation ........................................................................................................ 23
Table 4: Indexed fields .......................................................................................................................... 26
Table 5: Modified On and Version columns .......................................................................................... 36
Table 6: Modified By column ................................................................................................................. 36
LIST OF FIGURES
Figure 1 TED system modules ................................................................................................................ 9
Figure 2 TED website responsibility based layers ................................................................................ 10
Figure 3 Integration of Spring MVC with other layers ........................................................................... 11
Figure 4 Flow of a Request through Spring Security Filters .................................................................. 12
Figure 5 CACTI Active Session diagram ............................................................................................... 16
Figure 6 Webalizer Traffic Analysis diagram ......................................................................................... 17
Figure 7 Webalizer Summary by Month diagram .................................................................................. 17
Figure 8 Production management steps ............................................................................................... 19
Figure 9 Web application file structure .................................................................................................. 22
Figure 10 Workflow engine package Structure ..................................................................................... 24
Figure 11 Flow Service class diagram .................................................................................................. 25
Figure 12 Indexing Flow ........................................................................................................................ 25
Figure 13 Notice Viewer structure ......................................................................................................... 28
Figure 14 Reference Data Model ......................................................................................................... 30
Figure 15 Database replication with MySQL Cluster ............................................................................ 35
Figure 16 TED Deployment diagram ..................................................................................................... 37
Figure 17: Deployment diagram ............................................................................................................ 38
40643464
Page 4 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00
1 INTRODUCTION
1.1 PURPOSE OF THE DOCUMENT
The aim of this document is to provide a comprehensive architectural overview of the TED system.
This document describes how functional analysis and use cases are translated and structured in the
architecture by the development team.
1.2 SCOPE OF THE DOCUMENT
This document presents the technical architecture of the TED system. In this document, we focus on
the choices made for the TED system. Hereafter, the readers will find information about the
frameworks, tools and technologies used by the TED system.
1.3 INTENDED AUDIENCE
The present document is intended to be read by the following people:

Publishing operation team;

Publications Office Project Team;

Developments Project Team.
40643464
Page 5 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00
2 REFERENCE AND APPLICABLE DOCUMENTS
This section contains the lists of all references an applicable document. When referring to any of the
documents below, the bracketed reference will be used in the text, such as [R01].
REFERENCE DOCUMENTS
Ref.
Title
Reference
Version
Date
R01
TED-FSP-Functional
Specifications
TED-FSP
1.00
07/09/2009
R02
TED-DML-Data Model
TED-DML
1.00
28/04/2010
Table 1: Reference Documents
APPLICABLE DOCUMENTS
Ref.
Title
Reference
Version
Date
N° 10186
N/A
06/01/2009
N°10186
NA
06/01/2009
TED-PQP
1.01
08/09/2010
General Invitation to
Tender
A01
Production and
dissemination of the
Supplement to the Official
Journal of the European
Union: TED website, OJS
DVD-ROM and related
offline and on line media
Specifications
Hybrid service contract
A02
A03
Production and
dissemination of the
supplement to the Official
Journal of the European
Union: TED Website, OJS
DVD—ROM and related
offline and on line media
Project Quality Plan
Table 2: Applicable Documents
40643464
Page 6 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00
3 ACRONYMS AND ABBREVIATIONS
ABBREVIATIONS AND ACRONYMS
Abbreviation
Meaning
AOP
Aspect Oriented Programming
API
Application Programming Interface
CPV
Common Procurement Vocabulary
CRUD
Create Retrieve Update Delete
DAO
Data Access Object
ECMT
European Commission Machine Translation
FTP
File Transfer Protocol
HTTP
HyperText Transfer Protocol
IoC
Inversion of Control
JAR
Java Archive
JDK
Java Development Kit
JEE
Java Enterprise Edition
JSP
Java Server Page
JTA
Java Transaction API
LGPL
GNU Library or Lesser General Public License.
MVC
Model View Controller
NUTS
Nomenclature des Unités Territoriales et Statistiques
OJS
Official Journal Supplement
OOD
Object-Oriented Design
OPOCE
Office des Publications Officielles des Communautés Européennes
POJO
Plain-Old Java Object
TED
Tenders Electronic Daily
UDF
Universal Disk Format
URL
Uniform Resource Locator
WAR
Java Web Archive
40643464
Page 7 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00
4 ARCHITECTURAL REPRESENTATION
This document is a part of the Technical Specification of the TED System, the result of the design
phase.
This document presents the necessary views to represent the software architecture:

The Logical View: presents the decomposition of the software architecture into subsystems
and packages;

The Implementation View describes the overall structure of the implementation model, the
decomposition of the software into layers and subsystems;

The Data View describes the persistent data storage perspective of the system;

The Deployment View describes the physical infrastructure on which the TED software is
deployed and run. It specifies the physical nodes and network configuration that executes the
software, and also maps the processes defined in the Process View on to physical nodes.
40643464
Page 8 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00
5 LOGICAL VIEW
The Logical View presents an overview of the architecture and then provides the decomposition of the
software into design packages and sub-systems.
The TED system has been decomposed into six distinct modules represented in the next figure:
TED
website
Monitoring
Datawarehouse
License Holder
environment
Email analysis
and
notifications
Notice viewer
Content management (Workflow engine)
Figure 1 TED system modules

TED website: This module represents all the components needed for the public web interface
of the TED system. The TED Website section describes it in details.

Monitoring Data-warehouse: This module contains the data-warehouse user interface, it is
described in detail in the Monitoring data-warehouse section.

License Holder environment: The environment available for subscriber having a privileged
access to the contents of the TED. This module is described in details in the License Holder
environment section.

Email analysis and notifications: This module is responsible for the mailing of notifications and
received emails analysis. This module is described in details in the Email analysis and
notifications section.

Workflow engine: The workflow engine module contains all the components used for the
production management of the documents on the TED system. This includes indexing,
creation of DVD images, file transformations and the production dashboard. This module is
described in details in the section Workflow engine.

Notice viewer: The notice viewer is a simple stand alone Java application that executes the
transformations on the given XML files. This module is described in details in the Notice
viewer section.
5.1 TED WEBSITE
5.1.1 OVERVIEW
The TED system architecture is based on the J2EE application architecture. This architecture is
decomposed into ‘tiers’ and ‘layers’ as recommended by the J2EE specification.
The Layering design pattern when applied to a system breaks down the complexity of the system as a
whole by identifying the different parts of the system and reducing coupling between them. Layering
40643464
Page 9 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00
reduces the impact of a change in one layer on the rest of the system. Multi-dimensional layering is
about the combination of two other strategies:

Responsibility-based layering that associates each layer with a specific responsibility
(presentation, business and integration);

Reuse-based layering that identifies components that have a high potential of reusability,
possibly across different projects.
The application is made of several responsibility based layers:
Web
eu.europa.opoce.ted.controller
WebContent
eu.europa.opoce.ted.view
Domain
Service
eu.europa.opoce.ted.service
eu.europa.opoce.ted.model
Integration
eu.europa.opoce.ted.dao
Figure 2 TED website responsibility based layers

The Web Layer contains the logic to handle the interaction between the user and the system
via a Web Browser. To achieve this interaction, the Web Layer is allowed to call high-level
functions, provided in the Service Layer, and manipulate domain models, exposed in the
presentation, of the Domain Layer;

The Service Layer contains the business logic structured in high-level methods oriented
around use cases that may result in CRUD operations on entities of the Domain Layer. These
CRUD operations are realized by accessing the Data Access Layer;

The Domain Layer contains data, common rules and logic of the model: business or technical,
persistent or transient.
40643464
Page 10 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD

Software Architecture Document
Version: 1.00
The Data Access Layer acts as a medium between the entities of the Domain Layer and the
technical solutions insuring its durability. The Data Access Layer knows how and where the
persistent entities are stored. Typically, an entity of the Domain Layer has a corresponding
Data Access Object (DAO) in this layer that exposes methods to manage the object
persistence.
5.1.2 WEB LAYER DESIGN PACKAGE
The Web Layer is built on top of the Spring MVC framework. This framework implements and makes
intensive use of the different design patterns:

Model-View-Controller;

Front Controller;

Command Object.
The purpose here is not to give a complete explanation of how Spring MVC works but rather to
describe the philosophy and how in practice the TED system uses and extends Spring MVC.
The Model-View-Controller is the separation of concerns applied to the presentation tier, i.e., it
separates the view from the business data and processes, the controller being responsible for
handling requests and acting as a medium between the model and the view. With Spring MVC,
business objects can be reused as they are (no class extension or interface implementation required).
The following figure shows how Spring MVC components interact with the Application and Data
Access layers.
Presentation
request
Spring
security
Application
Data Access
DispatcherServlet
Service
Controller
Integration
Domain
response
Model
View
Figure 3 Integration of Spring MVC with other layers
In TED, the Model-View-Controller is implemented as Java classes extending the Spring Framework
classes (such as SimpleFormController).
The Front Controller design allows one to avoid having a separate servlet for each controller. Instead,
Spring MVC provides a generic servlet, the DispatcherServlet, which dispatches the request to a
specific controller. In TED, the Front Controller is handled by the DispacherServlet class of the
Spring Framework
The Command Object design pattern is used to map the HTTP request and parameters to a Java
object holding all the information. In TED, the Command Objects are implemented as simple Java
classes, which all extend the same parent class: TedDefaultPageCommand.
40643464
Page 11 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00
5.1.2.1 Access Control
When a request is submitted to a Spring Security protected web application, it is ensured to be
processed by Spring Security, through the standard Java Servlets and Filters. Indeed, Spring Security
provides a framework and a set of components to build the whole security processing chain. Thus a
request to a protected resource passes through each of Spring Security’s filters as depicted in figure
below:
Request
1. Channel-Processing Filter
(optional)
2. Authentication-Processing Filter
3. Integration Filter
4. Security Enforcement Filter
5. Secured Web Resource
Figure 4 Flow of a Request through Spring Security Filters
Each of these filters play a specific role in the amount of security desired to protect a web resource.
The first filter for example can enforce that web requests must use a given channel; i.e. HTTPS for
example. The second filter, Authentication-Processing Filter, is in charge of redirecting the user and
authenticating the user if the web resource is indeed protected. The fourth filter, Security Enforcement
Filter, is also interesting in that it checks that the appropriate access rights are given to the logged-in
user in order to access that web resource. This check is modular and might comprise of a combination
of different rules; allowing complex Access Control Lists (ACL).
This simple but powerful chaining mechanism ensures that all requests made by a web browser
comply with the security constraints imposed. These constraints can be set in configuration files, such
that it is external from the base source code.
5.1.2.2 Rollover menus with CSS
For all rollover menus of the website we use the “:hover” CSS attribute.
IE6 does not support this attribute on every html tag. To make it work on IE6 we use a javascript
function and a supplementary class in the CSS files (see below).
To inform "IE6/javascript disabled" users we use the tag <!--[if lt IE 7]> on our HTML pages.
Under this condition we use a <noscript> tag with a warning message which informs users about the
unusability of the navigation, when using IE 6 with no Javascript.
Javascript function:
40643464
Page 12 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00
<!--[if lt IE 7]>
<script type="text/javascript">
//Fonction destinée à remplacer le "LI:hover" pour IE 6
sfHover = function() {
var sfEls = document.getElementsByTagName("li");
for (var i=0; i<sfEls.length; i++) {
sfEls[i].onmouseover = function() {
this.className = this.className.replace(new RegExp("sfhover"), "");
this.className += " sfhover";
}
sfEls[i].onmouseout = function() {
this.className = this.className.replace(new RegExp("sfhover"), "");
}
}
}
if (window.attachEvent) window.attachEvent("onload", sfHover);
</script>
<![endif]-->
Warning Message:
<!--[if lt IE 7]>
<noscript>
<span class="red">
Attention vous utilisez une ancienne version d'internet explorer sans
javascript ...
</span>
</noscript>
<![endif] -->
CSS class:
Every <tag>:hover must have an equivalent <tag>.sfhover
5.1.3 SERVICE LAYER DESIGN PACKAGE
Transaction demarcations are managed declaratively using the Spring Framework. The selected
underlying transactions are handled by the Spring DataSourceTransactionManager.
The transactions are defined at the Service Level. Service class methods represent use-cases that are
usually considered atomic from a transactional point of view. This is then a good place to manage the
transaction. Following Spring’s philosophy, the transactions are configurable in all aspects (isolation
level, timeout …) in annotations.
The definition of the transactional boundaries has no impact on the Service classes.
5.1.3.1 Search service
One of the major characteristics of the TED website is its search capability. All the search
functionalities are implemented on the top of the Lucene library.
Apache Lucene is a high-performance, full-featured text search engine library written in Java. It is
suitable for any application which requires full text indexing and searching capability, Lucene has been
widely recognized for its utility in the implementation of Internet search engines and local, single-site
searching. The Lucene API is also known for its flexibility that allows it to be independent of the file
format to index.
40643464
Page 13 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00
All the search capabilities needed by the TED website are encapsulated within this search service.
The index file is progressively aggregated by the addition of the information retrieved from the parsing
and the indexation of the new documents. This process of indexation is handled by the content
management module which performs this operation for each new OJS release.
5.1.4 DOMAIN LAYER
The Domain Layer contains data, common rules and logic of the model. This layer contains the
identified business entities. This layer is unaware of how the domain object persistence is managed.
That is the responsibility of the Data Access Layer;
5.1.5 DATA ACCESS LAYER
This section describes the approach to form the basis for JDBC database access using Spring.
The JdbcTemplate class is the central class in the Spring JDBC core package that is used by TED.
It simplifies the use of JDBC since it handles the creation and release of resources. This helps to
avoid common errors such as forgetting to always close the connection. It executes the core JDBC
workflow like statement creation and execution, leaving application code to provide SQL and extract
results. This class executes SQL queries, update statements or stored procedure calls, imitating
iteration over ResultSets and extraction of returned parameter values. It also catches JDBC
exceptions and translates them to a more informative exception hierarchy.
The system makes use of the SimpleJdbcTemplate class which is a wrapper around the classic
JdbcTemplate that takes advantage of Java 5 language features such as variable arguments and
auto-boxing.
In order to work with data from a database, one needs to obtain a connection to the database. The
way Spring does this is through a DataSource. A DataSource is part of the JDBC specification and
can be seen as a generalized connection factory. It allows a container or a framework to hide
connection pooling and transaction management issues from the application code.
Spring provides other utility classes such as the RowMapper. A RowMapper instance is a
convenience class used to map one object per row obtained from iterating over the ResultSet that is
created during the execution of the query.
5.1.6 GENERAL PRINCIPLES
This section contains the general principles underlying the system and promoted by the architecture.
These principles are too general to be exposed as a specific design package but they are important
enough to be mentioned.
This section provides a short description of these principles.
5.1.6.1 Programming to Interfaces
This principle is also known in the longer version ‘Programming to Interfaces, not implementations’.
When a piece of software is developed, an implementation class must not directly be dependent on
other implementation classes but rather to their implemented interface.
This improves the scalability and maintainability of the software as other implementations of the
interfaces can be substituted for the current one with little impact on the dependent modules.
Its use is facilitated by the ‘Dependency Injection’ principle.
The ‘Programming to Interfaces’ principle also eases the test strategy of the software, mostly with unit
testing. Object classes are tested in isolation, as the test provides mock implementation for the
dependent interfaces used by the tested object.
40643464
Page 14 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00
5.1.6.2 Dependency Injection
The ‘Dependency Injection’ principle greatly facilitates the previous design principle, ‘Programming to
interfaces’. It removes the need for each object to declare explicitly in the JAVA code its dependencies
to the implementation classes. Configuration files do the job instead.
Each object is created by a container that populates the object with its dependencies. Thus, the object
does not know anymore the implementation class, only the interfaces.
In this project, this container is shipped along with the Spring Framework. Dependency injection (IoC:
Inversion of Control) is the base principle of Spring.
5.1.6.3 Aspect Oriented Programming
Aspect-Oriented Programming (AOP) complements Object-Oriented Programming (OOP) by providing
another way of thinking about program structure. In addition to classes, AOP gives you aspects.
Aspects enable modularization of concerns such as transaction management that cut across multiple
types and objects. (Such concerns are often termed crosscutting concerns.)
One of the key components of Spring is the AOP framework. While the Spring IoC container does not
depend on AOP, meaning you don't need to use AOP if you don't want to, AOP complements Spring
IoC to provide a very capable middleware solution.
5.2 MONITORING DATA-WAREHOUSE
The data warehouse information is made available for administrators using the web interface. Its
content is built using several tools, which are described in this section.
The Layering design pattern is also applied for the monitoring data-warehouse to break down the
complexity of the system as a whole by identifying the different parts of the system and reducing
coupling between them. The following sections give an overview of the components that are used to
combine and represent the information needed into web reports.
5.2.1 BIRT
BIRT (Business Intelligence and Reporting Tools) is a reporting system for web applications. BIRT has
two main components: a report designer based on Eclipse, and a runtime component. BIRT also
offers a charting engine that lets you add charts to your own application.
BIRT stated goals within the TED project are to address a wide range of reporting needs including:

Lists - The simplest reports are lists of data. As the lists get longer, BIRT supports grouping to
organize related data together but also totals, averages and other summaries.

Charts - For some reports numeric data are presented as a chart. BIRT provides pie charts,
line charts, bar charts and many more. BIRT charts can be rendered in several formats.

Crosstabs - Crosstabs (also called a cross-tabulation or matrix) are used to displays reports
that need to represent data in two dimensions.

Compound Reports – This kind of report is used to display side-by-side previously described
elements into a single document.
BIRT reports consist of four main parts: data, data transformations, business logic and presentation.

40643464
Data – Several kinds of data sources may be used simultaneously with BIRT. For the TED
project the main data source is the data warehouse databases. JDBC is used as connector
between the database and BIRT.
Page 15 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00

Data Transformations - Reports present data sorted, summarized, filtered and grouped to fit
the user's needs. While the database can do some of this work, BIRT is used to perform
sophisticated operations such as grouping on sums, percentages of overall totals and more.

Business Logic - Since data is seldom structured exactly as it is needed, some reports require
business-specific logic to convert raw data into information useful for the user.

Presentation - Once the data is ready, a wide range of display options may be used; tables,
charts, text and more.
5.2.2 CACTI
Cacti is a complete network graphing solution designed to harness the power of RRDTool's data
storage and graphing functionality. Cacti provides a fast poller, advanced graph templating, multiple
data acquisition methods, and user management features out of the box.
Figure 5 CACTI Active Session diagram
5.2.3 WEBALIZER
Website traffic analysis is produced by grouping and aggregating various data items captured by the
web server in the form of log files while the website visitor is browsing the website.
40643464
Page 16 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00
Figure 6 Webalizer Traffic Analysis diagram
Figure 7 Webalizer Summary by Month diagram
5.3 LICENSE HOLDER ENVIRONMENT
The License Holder environment module is limited as the ProFTPD server and its modules.
The content of the environment is generated by the content management module. Then, a symbolic
link used by the ProFTPD server is updated to put at License Holder disposal the new files.
40643464
Page 17 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00
5.3.1 AUTHENTICATION AND LOGGING
The ProFTPD server for the License Holders makes use of a specific module to enhance his
functionalities. The needed functionalities are

Authentication using the user information contained in the MySQL database.

Logging of the License Holder environment usage statistics. These files will be parsed and the
extracted information will be stored in the datawarehouse database.
The mod_sql module is installed to add these two functionalities to ProFTPD. It is comprised of a
front end module (mod_sql) and backend database-specific modules (mod_sql_mysql). The front
end module leaves the specifics of handling database connections to the backend modules.
5.4 EMAIL ANALYSIS AND NOTIFICATIONS
The email analysis and notifications module is in charge of the analysis of received emails and the
mailing of notifications and reminders to Contracting Authorities or web site users.
The email analysis and notifications module is implemented as an email processing agent built on the
top of the Apache James Mailet API. A mailet is a mail processing component which is executed
within a mailet container.
The Mailet API defines interfaces for both Matchers and Mailets:

Matchers are used to match mail messages against certain conditions. They return some
subset (possibly the entire set) of the original recipients of the message if there is a match. An
inherent part of the Matcher contract is that a Matcher should not induce any changes in a
message under evaluation.

Mailets are responsible for actually processing the message. They may alter the message in
any fashion, or pass the message to an external API or component. This can include
delivering a message to its destination repository or SMTP server.
In the TED project, Matchers are used to analyse the emails and detect spam. An internet blacklist is
used to detect the undesirable email (any mail with sender matching an entry in this blacklist is
automatically forwarded to the spam folder). The “out of office” replies are also managed in a special
way: all incoming mails are searched for a given pattern (for instance “*out of office*”) in the subject or
content of the mails. If the pattern matches, the mail is automatically flagged as out-of-office, and is
forwarded to the out-of-office folder. The subject of these mails is prefixed by “Out of office”.
Mailers, on the other hand, are used to fulfil the mailing of notifications.
For the TED project, the Apache JAMES server is used as container and is responsible for the
assembly and configuration of the deployed Mailet and Matchers.
5.5 WORKFLOW ENGINE
The workflow engine is responsible for the processing of the document files received by the
Publications Office and the creation of the file system used for the creation of the DVD images (daily,
weekly and monthly images).
The performed operations are mainly transformations and indexation of the received file. The content
management module also contains the production management dashboard which is used to monitor
and controls the steps depicted on the figure bellow:
40643464
Page 18 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00
Figure 8 Production management steps
5.5.1 VALIDATION AND FILES TRANSFORMATION
The purpose of the Validation and files transformation step is to create the formatted content to be
published from the received XML notices. The prepared content is then stored in the content library.
The different transformations may be executed at different time. Technically, the transformations are
performed through XSLT for all the formats to be supported.
RSS feeds are generated for the publication day by querying the corresponding notices, formatting the
RSS feed and storing the result in content library. RSS feeds are generated after the creation of the
index according to the description of the next section.
Notice family changes are populated to existing notices.
5.5.2 PDF GENERATION AND TIME-STAMPING
For the generation of PDFs, we use XSL-FO as an intermediate format, and custom version of Apache
FOP as the composition engine. Apache FOP was customized in order to add support of PDF/A-1a.
Standard compression and file organisation techniques are used to compile the results per publication
channel.
The time stamping of the PDF/A-1a notices is performed by the PDF Time Stamping tool using open
source PDF and Cryptography libraries. Once the PDFs are time stamped, they are stored in the
content library.1
1
Notice that PDF time stamping is currently not activated on the TED web site: a flag permit to put the
time-stamping service in a degraded mode.
40643464
Page 19 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00
5.5.3 INDEXING
All the files are indexed after validation and transformation using Apache Lucene. Documents are
parsed to extract elements needed for the search on specific elements but also for the free text
search. The indexing process is split in three distinct steps; creation of the five days index, creation of
the active index and finally update of the archive index.
5.5.4 DVD IMAGE CREATION
Three distinct DVD images related to the dissemination of the Supplement of the Official Journal are
created by the system during the production process. These image files contain different file types
which are mainly related to the PDF format:

PDF/A-1a: the file format for the long-term archiving of electronic documents. It is based on
the PDF Reference Version 1.4 from Adobe Systems Inc.

PDF/A-1a time stamped: The time stamped version of the PDF/A-1a document file.

PDX: The Acrobat Catalogue Index file contains the index of all the document of an OJS
issue. Acrobat reader is able to directly use this kind of file to perform searches on the content
of the documents. This file is built for the weekly DVD-ROM.
Generation of PDF/A-1a and PDF/A-1a time stamped files of the documents are generated by the
system using the XML documents. These transformations are explained in Validation and files
transformation. In the weekly DVD-ROM image, a PDX or PDF index file is created manually to index
all the document of the current OJS issue. Adobe Acrobat Professional is used for this purpose by the
publishing operations team. For performance reasons, this tool is installed in the production
environment to ensure a direct access to the files to index. Therefore, a remote desktop access to
Acrobat Professional is put at the disposal of the publishing team.
The table of contents PDF file is generated automatically by the system using iText. iText is a library
available under LGPL license for dynamic PDF document generation and manipulation.
The creation of the image file is an automated process triggered by the publishing operation team.
This is achieved using the mkisofs tool, with support of UDF format.
5.5.5 CONTRACTING AUTHORITY NOTIFICATION
A Contracting Authorities are notified by the TED system about the publication of their notices. A
notification is sent to each contracting authority to notify them that their notices have been published in
the OJS. The email contains an UDL link to the notice of the corresponding contracting authority and
the time-stamped PDF/A 1a.
The TED system also sends a reminder to the Contracting Authority for each contract notice that does
not have a corresponding award notice.
Of course, in order to be able to send reminder and notification emails, the TED system needs to be
able to retrieve the email address of the Contracting Authorities for each specific notice. Unfortunately,
there is no way to extract this contracting authority email address in a “standard” way. This
information does not exists in the common notice XML header. Actually, a different extraction method
exists for each type of form. The table named DOCUMENT_XML_INFO contains the XPath to the
Contracting Authority email for the different type of forms. This implementation choice avoid to
hardcode the extraction rules in the code, and provide a much more flexible way to support new form
in the system.
40643464
Page 20 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00
5.6 NOTICE VIEWER
The notice viewer module represents a stand-alone java application responsible for the transformation
of XML files in a well formatted version suited to be directly displayed. This application is installed on
an Office server and is used directly using command lines. The notice viewer doesn’t include user
interface nor persistence capabilities.
The notice viewer transformation support two output formats, the HTML output format and the PDF
format. It will first validate the input notice using the XML schema, then an UTF-8 validation of the
notice is performed.
Technically, the transformations are performed through XSLT for the generation of HTML files and for
the generation of PDFs, we use XSL-FO as an intermediate format, and Apache FOP as the
composition engine. An HSQL files based (read only) database is used to retrieve the translation of
the different reference data.
40643464
Page 21 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00
6 IMPLEMENTATION VIEW
The implementation view describes the overall structure of the implementation model and the
decomposition of the software into modules and specific components.
6.1 TED WEBSITE
6.1.1 OVERVIEW
The TED application is packaged as two separate Web Archive files (WAR) that represent the TED
website and the data-warehouse. This separation allows the deployment of each of these applications
separately on different servers. The following figure shows the physical contents of these web
applications. Note that the two applications share the same file structure; the difference being the
specific JSP pages and Java classes (along with their dependent Java libraries).
Figure 9 Web application file structure
40643464
Page 22 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00
6.1.2 TED XSL TRANSFORMATION
The TED Website is based on the XML transformation (XSL transformation) to transform the input
TED_EXPORT XML file to the different presentation views: HTML or PDF.
During the daily processing (see Workflow engine) the export XML file is transformed to an internal
XML (TED_INTERNAL) for each supported languages. The TED internal XML format contains all the
information needed for the presentation layer. For instance the reference data are translated in the
internal XML file for each supported language, the internal format is enriched with formatting
information such as paragraph, URL and email addresses.
Only documents transformed during the daily processing are persisted on the file system. Historical
documents are transformed into the ted internal format on the fly (from OJS 2005/206 to OJS
2010/041). The following table shows the list of XSL transformations (input/output) for the TED
system.
TED TRANSFORMATION
XSL
Input
Output
“2.0.5 DTD” xml or
“TED_EXPORT 2.0.7” xml
TED_INTERNAL XML
InternalTed-To-HtmlTed.xsl
TED_INTERNAL xml
Notice HTML
InternalTed-ToHtmlDataViewTed.xsl
TED_INTERNAL xml
Notice data view HTML
InternalTed-ToXmlFOTed.xsl
TED_INTERNAL xml
Notice PDF
InternalTed-ToLicenseHolderMETA.xsl
INTERNAL_OJS xml
Notice Meta License holder
InternalTed-ToLicenseHolderUTF-8.xsl
INTERNAL_OJS xml
Notice UTF-8 License holder
InternalOJS-ToInternalTed.xsl
InternalOJS-ToInternalTed_<<FORM>>.xsl
Table 3: TED XSL Transformation
6.2 EMAIL ANALYSIS AND NOTIFICATIONS
The email analysis and notifications module is packaged in a JAR file that contains the classes
developed for the handling and filtering of emails. This jar is deployed on the James email server.
The email module is implemented as an email processing agent built on the top of the Apache James
Mailet API using Matchers and Mailets interface.
40643464
Page 23 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00
6.3 WORKFLOW ENGINE
6.3.1 THE WORKFLOW ENGINE PACKAGE
The workflow engine is packaged as an executable JAR file that contains the classes developed for
the file system creation, XML transformation, files indexing processing, and DVD generation. The
workflow engine is responsible for the instantiation and processing of a new flow for each publication
date.
Figure 10 Workflow engine package Structure
6.3.2 THE WORKFLOW ENGINE IMPLEMENTATION
The workflow engine is composed of multiple flow definitions:
-
The daily OJS flow: is responsible for the processing of the data for the next
publication date.
-
The User management flow: is responsible for the workflow management.
-
The Contracting Authority reminder flow: is responsible of sending the notice
reminders.
-
The reporting flow: is responsible for the processing of the report for cacti and
datawarehouse reports.
-
The cleanup flow: is responsible to clean the file system of all temporary files.
Remarks: it exists a specific flow that is used only once for the historical data processing.
- The take up archive flow: is responsible for the processing of the full historical data
already in production (5 years of publication).
40643464
Page 24 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00
Figure 11 Flow Service class diagram
Each workflow definition is defined in a spring configuration. The flow definition must implement the
ProdFlowService interface. It contains the list of steps that compose the flow.
The flow definition is composed of several steps responsible for the execution of a specific task
according the step specification. Those steps must implement the ProdStepService interface and
specify the dependencies between the steps (waitingSteps).
6.3.3 THE INDEXING IMPLEMENTATION
During the Daily OJS flow several indexes are generated to provide a fast search engine to the TED
Website. The indexing process is based on the Lucene framework.
The Ted application receives a TED_EXPORT XML file. The input XML file is converted into a Lucene
Document object where each value to be indexed is mapped using a key/value pair.
Each field’s value is analysed using a Standard Lucene Analyser then it is indexed into the appropriate
folder. A full description of indexed field is available in the Table 4: Indexed fields.
Figure 12 Indexing Flow
Search field
Code
Awarding authority search fields
Country of the awarding authority
CY
Name of the awarding authority
AU
Place
TW
Type of awarding authority
AA
40643464
Page 25 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Internet address (URL) of the awarding authority
Version: 1.00
IA
Date search fields
Date document sent to the Publications Office
DS
Deadline for request of documents
DD
Deadline for receipt of tenders
DT
Publication date
PD
Reference search fields
Original language
OL
Number of reference document
RN
Document number
ND
Edition number of Supplement to the Official Journal
OJ
Codification search fields
Type of document
TD
Type of contract
NC
Type of procedure
PR
Origin (applicable regulation of procurement)
RP
Type of tender, division into lots
TY
Criteria for award of contract
AC
Title of document
TI
Main activity
MA
Title of the main activity
MN
Classification search fields
Original CPV code (until 16 September 2008)
OC
Original title of the CPV code (until 16 September 2008)
ON
Current CPV code (from 17 September 2008)
PC
Current title of the CPV code (from 17 September 2008)
PN
NUTS code
RC
Title of the NUTS code
RG
TED specific fields
FT
Full text
Table 4: Indexed fields
6.3.4 THE WORKFLOW TRANSFORMATION
Several steps during the daily processing use the XML transformation to transform the TED_EXPORT
input files into different other file formats such as license holder files or PDF notices.
40643464
Page 26 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00
The Table 3: TED XSL Transformation shows the list of XSL transformation (input/output) used by the
TED Workflow engine.
6.3.5 THE WORKFLOW MANAGEMENT TOOL
The workflow management tool (Dashboard or workflow management interface) application is
packaged as Web Archive (WAR). The workflow management allows the control of the production
lines within a single user interface. It is implemented using standard java Servlet and JSP. The
communication between the management tool and the workflow engine is built over the socket API.
6.4 TED SYSTEM I18N SUPPORT
We use two mechanisms to support the multilingualism for the TED System (TED Website and TED
Workflow engine). The business data (such as CPV, NUTS) translations are stored into the database
and the interface message are stored into XML files.

The reference data are all translated in the database in the table <code>_Description that
contains the translations in the 23 languages supported by the TED Website.

The spring framework offers a simple and easy mechanism to support i18n:
ReloadableResourceBundleMessageSource. The labels in the Ted Website are all translated
using the spring mechanisms. The files messages_<<language code>>.xml contain the labels and
messages displayed to the users by the TED Website. The errors_<<language code>>.xml files
contain the error messages shown to the users.
6.5 NOTICE VIEWER
The Notice viewer is packaged as a tar.gz archive that contains the classes developed for XML
transformations. These archive contain all the dependencies necessary for the XML file transformation
and production management. The notice viewer implementation use an embedded HSQL database to
hold the reference data and associated translations for multilingual support. The transformations are
performed through XSLT for the generation of HTML and PDF files.
40643464
Page 27 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00
Figure 13 Notice Viewer structure
6.6 REFERENCE DATA
The reference data are the business code data. Each reference data are composed of a code and 23
translations. All reference data are versionable, some of these data are also hierarchical.
The reference data are:

Heading

Country

Country groups

Type of authority (sector or awarding authority)

Contract type (market code)

Procedure type

Document type

Regulation type

Type of bid

Award criteria

CPV code

Business Sectors

Main activity
40643464
Page 28 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document

NUTS code

Languages

Extended CPV code (Additional vocabulary)
Version: 1.00
Reference data stored on the TED system should change over the time, and the modifications should
be taken in account including the relationship between the codes of the new version and the previous
one.
Modifications of these reference data such as CPV Codes have an important impact on the whole
TED system and more especially on the search features.
To handle alteration on reference data, the TED system use a versioning algorithm that permits the
translation of old code version to new one to adapt as much as possible the search features. The
content of document won’t be modified.
The search index will generally be modified after a code change but the document itself won’t change.
Thus, it’s possible that a free text search will find document that do not have the searched text in its
content.
6.6.1 REFERENCE DATA: DELETION
The version n of the reference data has codes that have been deleted in the version n+1.
When a code is deleted it is not available anymore in the search interface. All the documents using the
deleted code won’t be found anymore using the related criteria.
6.6.2 REFERENCE DATA: ADDITION
A new code has been added to version n+1.
The new code is added to the search interface. There’s no impact on the previous documents.
6.6.3 REFERENCE DATA: MODIFICATION
Several modification types could be foreseen especially in case of hierarchical data.
Case 1: Code in version N is replaced by a single code in version N+1.
Documents that use the previous version of the code will be re-indexed in order to be found with using
the associated new version of the code.
Case 2: Code in version N is replaced by multiple codes in version N+1.
In case of hierarchical data, Documents that use the previous version of the code will be re-indexed in
order to be found using the parent of the code in the old version. If no parent exists or if the data is
non hierarchical, this modification will be handled like a deletion.
Case 3: Code in version N moves in the hierarchy in version N+1.
Documents that use one of the previous version of the codes will be re-indexed in order to be found
using its new parents. Searching for this code using old parent codes won’t be possible anymore. An
exception to this rule will be made for Countries, when a country change of group the documents won’t
be re-indexed. In such case, only new documents will be found using the new parent code.
40643464
Page 29 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00
Figure 14 Reference Data Model
6.7 CONTENT MODIFICATION
This section describes the procedure that must be followed to add a new form or to update the
reference data.
6.7.1 ADDITION OF A NEW FORM
6.7.1.1 Prerequisite
The following information must be known before the addition of a new form in the TED system:

Does the form contain specific business functionalities that must reflected on the TED
website? For instance for cancellation document a “cancelled” indicator is shown on the
document impacted.

Does the new form contain contracting authority email addresses?

Request the labels needed for the document view translated in all languages.
6.7.1.2 tasks

If the document contains contracting authority email addresses, then the XPath to the tag
containing these addresses must be added to the table DOCUMENT_XML_INFO.

The new XSLT transformations must be implemented to generate the internal format, the
HTML view, the PDF and License Holder’s specific formats.
6.7.2 MODIFICATION OF REFERENCE DATA
Several modification types are foreseen regarding the reference data. First, all the information needed
are described. Then the actions needed depending on the modification are explained.
40643464
Page 30 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00
6.7.2.1 Prerequisite

In case of the addition of a new code. All the labels in every language must be requested.

In case of the addition of a new code in a hierarchical reference data. The place of the code in
the hierarchy must be known.

If the creation of a new version is foreseen. Then all the mappings between the current and
the next version must be clearly identified.
6.7.2.2 Modification of an existing reference data version
We consider a modification of an existing reference data version in the following cases:

Only labels of existing codes in the current version must be changed.

New codes are added and all the codes in the current version must be kept.
In these cases the reference data tables must be updated with the modifications needed. Then if
existing codes are modified; a full re-indexation for the reference data must be performed.
If the procedure type or document type reference data are impacted please also refer to section
6.7.2.4.
6.7.2.3 Creation of a new reference data version
We consider the creation of a new version of the reference data impacted in the following cases:

Some of the codes in the current version are not used anymore and must be removed from
the website interface (search mask, browse,…).

The code or the signification of a reference data changes.
A new version of the reference data must be created in the database:

CODE_XXX table : mandatory

CODE_XXX_VERSION table: mandatory

CODE_XXX_MAPPING table: mandatory

CODE_XXX_HIERARCHY table: mandatory if the reference data is hierarchical.
When the new version of the reference data is valid (not before!):

All the documents must have been re-indexed using the new version of the code (with the help
of the new mapping).

In the DOCUMENT table; the column XXX_CURRENT_VERSION must have been updated
with the id of the reference data in the last version.
If the procedure type or document type reference data are impacted please also refer to the next
section.
6.7.2.4 Modification of procedure (PR) and document type (TD)
If the procedure type or document type reference data are modified additional action must be
performed.
40643464
Page 31 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
6.7.2.4.1
Software Architecture Document
Version: 1.00
Procedure (PR) and document type (TD)
The combination between PR and TD gives information about the need to award a document.
Therefore if new PR or TD code must be added, it’s necessary to know if these new codes are related
to documents that need a reminder or not.
This information is stored in XX_CONTRACT_AWARD_NEEDED column in the main reference data
table.
6.7.2.4.2
Document type (TD)
Some document types are used to indicate that a notice is contract award. Therefore if a new TD code
is added, it’s necessary to know if the new code should be considered as an awarding type.
If a new awarding type must be taken in account or removed the column
TD_CONTRACT_AWARD_NOTICE of table CODE_DOCUMENT_TYPE must be updated accordingly.
The same procedure must be followed for document types identified as corrigenda. The column
TD_CORRIGENDA of table CODE_DOCUMENT_TYPE must be updated accordingly.
6.8 APPLICATION DEPENDENCIES
APPLICATION DEPENDENCIES
Application
Layer
External Systems / Dependency
Spring MVC
Web Layer
Spring Security
Spring Integration
Ted Application
File System
Integration Layer
MySQL Database
lucene
Web Layer
JSP/Java Servlet
Spring Integration
File System
XSL
TED Workflow
Application
XSL-FO
Integration Layer
FOP
iText
James Mail
MySQL Database
Mkisofs
40643464
Page 32 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00
lucene
Notice Viewer
Spring Integration
XSL
XSL-FO
FOP
HSQL
6.9 BACKUP PROCEDURE
The daily backup is implemented using a full database backup and an incremental file system
repository backup. These backups are configured to run overnight on each production lane.
The MySQL databases are additionally backed up using an export script. This script runs before the
scheduled disk backup in order to ensure that they are also included in the backup. It produces a
standard MySQL database export, which can be used for easy recovery into another database
instance. The script adds another level of fail tolerance for the stored data on top of the replication
mechanism.
Every day a daily backup on the back-end, front-end, data warehouse and common backup is
executed and a temporary folder on the NFS is created to hold the different backups. A cron script is
responsible to transfer the result to an external backup unit server.
6.9.1 DAILY BACK-END BACKUP PROCEDURE
Non cluster database backup
To backup back-end non-cluster database a dump is made for each back-end server. All non-cluster
tables and views are dumped. Finally a restore script is created for each dump.
Cluster database backup
The cluster database is split on the two production lanes, so the backup will dump the entire database.
To backup cluster database the MySQL Node manager is used. It takes a snapshot of each node of
the cluster. Then an archive of each snapshot is made. Finally a single restore script is created to
restore each node of the cluster.
Repository file system backup
In order to reduce the time of the file system backup; inotify 2 is used. It permits to log all modifications
made on a set of folders.
Inotify is used on the repository folder in order to make a file, listing all files modified, since the last
backup. In this case a faster rsync is possible by using this file.
Finally inotify file is also used to create an archive of the set of files modified since the last backup.
2
inotify is a file change notification system, a kernel feature that allows applications to request the
monitoring of a set of files against a list of events. When the event occurs, the application is notified
40643464
Page 33 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00
Backup synchronisation between databases and file system
To backup back-end server’s databases and file system must be synchronised. In order to achieve
this task the following tasks are made:
1. a lock is put on databases, to avoid any modifications;
2. inotify file snapshot is made;
3. Locks are released.
TED data file system
In a first part only the backup of the indexes and the RSS files of one of the two servers is made.
In a second part all logs of both servers are backed up.
6.9.2 DAILY FRONT-END BACKUP PROCEDURE
Database backup
The same backup procedure as back-end non-cluster database backup is used.
TED data file system
The same backup procedure as back end TED data file system backup is used. In this case logs and
license holder environment are backed up.
6.9.3 DAILY DATA WAREHOUSE BACKUP PROCEDURE
Database backup
The same backup procedure as the back-end non cluster database backup is used.
6.9.4 DAILY COMMON BACKUP PROCEDURE
The common backup is a backup that is execute on all server and are common to all server.
Configuration file system
A backup of the snapshot configuration of each server is made.
TED data file system
For each backup a snapshot of the logs are made.
40643464
Page 34 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00
7 DATA VIEW
This chapter describes the persistent data view of the system. More specifically, it explains the
technical database columns and the functions required to implement version and session
management, optimistic locking and user contexts. The Object-Relational Mapping used to implement
the persistence layer is Spring JDBC.
Full information about the TED data model is available in [R02].
7.1 MYSQL CLUSTER
MySQL cluster is used for the TED databases. MySQL Cluster is a high-availability, high-redundancy
database adapted for the distributed computing environment. It uses the NDBCLUSTER storage engine
to be able to run in a cluster. A MySQL Cluster consists of a set of computers, each running a MySQL
server, a data node and a management server. MySQL cluster is used within the TED system for
documents and TED website data (also called volatile data). The relationship of these components in
a cluster is shown here:
MySQL clients
Management client
SQL nodes
NDB node
NDB node
Data Nodes
NDB node
NDB node
NDB management
server
Figure 15 Database replication with MySQL Cluster
All these elements work together to form a MySQL Cluster. When data is stored in the NDBCLUSTER
storage engine, the tables are stored in the data nodes. Such tables are directly accessible from all
other MySQL servers in the cluster. The data stored in the data nodes for MySQL Cluster is mirrored;
the cluster handles failures of individual data nodes.
The two major types of nodes are described below:
40643464
Page 35 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00

Data node: This type of node stores cluster data. There are as many data nodes as there are
replicas, times the number of fragments. A fragment is a portion of a database table; a table is
broken up into and stored as a number of fragments. Under the NDB storage engine, each
table fragment has a number of replicas stored on other data nodes in order to provide
redundancy. The TED MySQL Cluster is configured using one fragment and four replicas
giving a total number of four data nodes.

SQL node: This is a node that accesses the cluster data. In the case of MySQL Cluster, an
SQL node is a traditional MySQL server that uses the NDBCLUSTER storage engine.
The TED system uses one NDB node and one SQL node per back-end server that makes a total of
four NDB nodes and four SQL nodes. Each production line has one NDB management server.
7.2 TECHNICAL COLUMNS
Some database tables used to store business entities in the TED system have columns that do not
hold business data but are used only to implement specific functionalities.
7.2.1 AUDIT SEGMENT
Each table of the MySQL TED databases contains a MODIFIED_ON column and a VERSION column
for the optimistic locking and for versioning:
Column
Data Type
Description
MODIFIED_ON
TIMESTAMP
The last update date of the record
VERSION
INT
The version number of the record
Table 5: Modified On and Version columns
Volatile data tables also contain the MODIFIED_BY column that gives the identifier of the user who
has modified/created the entry.
Column
MODIFIED_BY
Data Type
VARCHAR
Description
The username of the user who has modified or
created the entry.
Table 6: Modified By column
The data is persisted into the database when the Spring JDBC ‘persist’ method is called.
At this time, a trigger checks that the VERSION field of the updated entity is the same that the one
stored into the database. This verification allows the system to know if a new version has overridden
the last loaded data during the session data manipulation.
If the record version we want to update is the same record version of the record stored in database,
the record is updated. A trigger is then called, which updates MODIFIED_ON and increment the
VERSION field of the record. Otherwise, it results in an exception that avoids concurrent modification
of the same entity.
40643464
Page 36 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00
8 DEPLOYMENT VIEW
The TED system runs on two distinct production lanes. One front end Load-balancer, in charge of
routing any requests to a production lane following the figure bellow:
TED System
Main Loadbalancer
Production
Lane 1
Production
Lane 2
Figure 16 TED Deployment diagram
The TED modules are deployed on different physical components. The following list contains all the
servers required for one production lane and the association between the module described previously
and the server on which they are deployed.
The frontend Web Server of each production line is composed of:

An Apache Web Server with a Tomcat Load-balancing module, in charge of routing the HTTP
requests to a Server and serving the static resources (such as pictures, for instance);

A ProFTPD Server which is used for the License Holder environment module;

A James email server: The Email analysis and notifications module is deployed on this email
server.

A MySQL Database for Data warehouse information.
The two backend servers of each production line are composed of:

A Tomcat Application Server which hosts the TED website, the datawarehouse and the TED
Workflow management tool website.

The workflow engine modules that run on a specific JVM.

A copy of the indexes used for the searches performed by the Web application.

A copy of the public RSS feeds used by the Web application.

A MySQL Database for document and volatile data;
The NFS server is mainly in charge of hosting the content library. It uses RAID level 5 to provide a
high level of fault tolerance combined with high performance. The content library is sized to contain all
the received documents from the Office and all the subsequent documents obtained by
transformation. The Network File System (NFS) server is composed of:

A windows XP OS via VMWare for the Adobe PDF indexes of the weekly DVD (using Adobe
professional)

A MySQL cluster manager node. It is in charge of managing the cluster replication data
between the four instances (two on each production lane) of MySQL Cluster node.
40643464
Page 37 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00
The following schema gives an overview of the deployment of the different modules of the TED
system on the different servers present in one production lane.
Figure 17: Deployment diagram
The production of the content runs in parallel on all production lanes. The objective is to have the
information available on all production lanes. This to ensure that if one fails, the other one can
40643464
Page 38 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00
continue to serve the content. The full content library is duplicated on the NFS of each production
lane. The daily switch of the publication day is synchronized on each production lane to ensure that
they serve the same content. Initially the entire TED system is composed of two production lines.
Some processing steps are only processed on one production lane (e.g. sending of emails). If one
production line breaks, the operator executes these processes on the other line.
The load balancers dispatch the incoming requests based on the load of each server. Session
forwarding is used to keep all requests from one user to the same server.
Some document related information is replicated on the MySQL local instance of each back end
server. This task is performed by the production workflow; synchronization steps are used to update
databases of each back-end server to ensure data coherence.
Volatile database information (e.g. registered user information and document meta data) is inserted in
all database servers simultaneously with the replication mechanisms offered by MySQL cluster.
The Data warehouse information is duplicated. Each data warehouse database instance contains the
information of all production lanes and is filled with the logs of all the servers and processes.
8.1 NETWORK FILE SYSTEM SERVER
Two network file system servers are configured, one on each production lane. Each NFS server holds
the content library and the TED backup and it hosts a Windows XP via WMWare and a MySQL cluster
manager node. The result of the backup procedure is constructed and stored on each NFS, then all
this information is copied at the backup site.
8.1.1 TED REPOSITORY FILE SYSTEM
The TED repository file system has the following structure:

/data/ted-[A||B]/ted-data/input

/data/ted-[A||B]/ted-data/repository

/data/ted-[A||B]/ted-data/dvd

/data/ted-[A||B]/ted-data/license-holder
where [A|B] means production lane A or production lane B
TED REPOSITORY
Path
/data/ted-[A||B]/ted-data/input
/data/ted-[A||B]/ted-data/repository
/data/ted-[A||B]/ted-data/dvd
/data/ted-[A||B]/ted-data/license-holder
40643464
Information
contains the original compressed input file
contains the processed files (XML, PDF, PDF
time stamped)
contains the last daily, weekly and monthly
DVD
contains the license holder files (UTF-8 ,and
META-XML format)
Page 39 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00
8.1.2 TED TEMPORARY BACKUP FILE SYSTEM
Each NFS holds a temporary backup file system which is built during each backup process. This is a
temporary storage of backed-up data, the content of all this folder is copied at the backup site in a
second phase.
The TED backup file system has the following structure:
TED BACKUP
Path
Information
/data/ted-[A||B]/backup/daily-backup
/data/ted-[A||B]/backup/repository
contains the daily backup files
contains a full repository backup of /data/ted[A||B]/ted-data/repository
/home/backup
contains backup scripts
/home/backup/log
contains the backup of processing logs
8.1.3 TED MIRROR BACKUP FILE SYSTEM
This section contains a description of the filesystem present on the backup machines at the backup
site.
The TED backup file system has the following structure:
TED BACKUP
Path
/data/backup/Prodlane-[A|B]/daily-backup
/data/backup/Prodlane-[A|B]/repository
/data/backup/Prodlane-[A|B]/repository-delta
/home/backup
/home/backup/log
Information
contains the mirror of the daily backup files
folder
contains the mirror of the repository backup
folder
contains delta repository files
contains synchronization backup scripts
contains synchronization backup processing
log
8.1.4 WINDOWS XP VIA VMWARE
The Windows XP (via VMWare) has the only purpose to host Adobe Professional. The Adobe
Professional product is use to generate PDX (Acrobat Catalogue Index) file to be included in the
weekly DVD. The protocol samba is used between the virtual Windows and its host to share the file
system.
40643464
Page 40 of 41
Production and dissemination of the
Supplement to the Official Journal of the
European Union: TED website, OJS DVD-ROM
and related offline and online media
Ref: TED-SAD
Software Architecture Document
Version: 1.00
8.2 JAMES EMAIL SERVERS
Two James email server are installed, one on the front-end of each production line. This implies a
specific DNS configuration explained in the following point.
8.2.1 DNS CONFIGURATION
Two MX (Mail Exchanger) entries are registered on the DNS server of “ted.europa.eu”:

mail1.ted.europa.eu and

mail2.ted.europa.eu.
These entries ensure any SMTP requests to reach the requested mail-server (James of production
line 1 for mail1.ted.europa.eu and James of production line 2 for mail2.ted.europa.eu).
This means that the load balancing process is not triggered when accessing these mail-severs.
8.2.2 SPAM FOLDERS
The spam folders are available, using an SSH access, for browsing, downloading and burning the
CDs with the supposed spam content.
8.3 DATABASE ORGANISATION
A TED Production Lane is organised in five databases:

On each front-end server one TED_DATAWAREHOUSE database: this is the database that
holds the data related to the data-warehouse and monitoring. There’s no replication between
the two production lanes: each TED_DATAWAREHOUSE database contains the full TED
data-warehouse data;

On each backend server runs a single MySQL instance. On this instance, tables are created
with two distinct engines; the first one for tables that must be created using the MySQL cluster
and the second one for tables that must be create locally. TED schema contains local table
and TED cluster schema contains tables shared over the cluster.
In summary, as there are two production lanes, considering that the cluster database is shared among
all the back-end servers in all production lanes. This brings the total number of MySQL databases to
seven:
-
One instance per front-end server for a total of 2 instances.
-
One instance per back-end server for a total of 4 instances.
-
One cluster instance shared among the back-end servers.
40643464
Page 41 of 41
Download