EASAIER Proposal for a Web Application Conception

advertisement
Proposal for a Web Application conception
Internal Note
Proposal for a Web Application
conception
Abstract
This document addresses architecture and integration issues
related to the building of EASAIER web application prototype
Version 0.1 Draft
Date: September 13th 2007
Editor: Silogic
Contributors: Luc Barthélémy, Laure Bajard, Benoît Rigolleau, François
Scharffe, Michael Luger
1
Proposal for a Web Application conception
Table of Contents
1.
2.
3.
4.
5.
6.
7.
8.
9.
Introduction ....................................................................... 3
Global architecture / generalities ............................................ 3
Detailed module Architecture ................................................... 5
Music Workflow ................................................................... 8
Speech workflow ................................................................. 9
Local related media workflow ................................................ 10
Web related media workflow ................................................. 11
Stream file workflow ........................................................... 12
Servers Interfaces ............................................................... 13
9.1.
Speech WS ......................................................................................................................................... 13
Implementation as a Web Service .............................................................................................................. 13
Implementation as an Http servlet ............................................................................................................. 14
9.2.
RDF Storage WS ................................................................................................................................ 14
9.3.
Web related media WS....................................................................................................................... 14
2
Proposal for a Web Application conception
1. Introduction
This document presents the results of the detailed design phase activities. It details the architecture and
describes the internal description of the solution and all its components.
This document will be incrementally developed. It is based on previous work from the architecture group. The
goal of this document being to detailed the backend components. It was initiated during the Toulouse meeting
between Silogic and DERI. It is a design guideline, and exposes all components to be developed.
2. Global architecture / generalities
The model is divided in back-end and front-end applications.
The front-end applications are:
-
-
The web client application: it allows the user to browse and query the archive according to the
retrieval system functionalities. Its functionalities may be less numerous consequently restricted by
web application services. Simple “play” and visualisation are provided but expensive processing like
real-time audio filters are not supported.
The advanced user application: it allows the user to browse and query the archive according to the
retrieval system functionalities. It permits also to load and visualise an audio files and its related
metadata and media. The user is able to interact with audio resources through separating and
identifying sources, processing and modifying the audio, and aligning various sources at playback
(e.g., a musical piece and its source, or speech and its associated text).
On the back-end side, we will find:
-
The database storage composed of :
o An RDF Storage : to store objects identifiers, metadata and related media in a semantic form
o A media storage : to store binary data (audio, video, images…)
-
The SPARQL end-point to query the RDF storage and return results to the front-end applications (by
the way of a web server for the web client).
-
The database administration application which is in charge of database administration (add, modify
or remove data from the database, manage user and rights). The administration tool is helped by a
feature extraction knowledge module to extract automatically metadata from an audio file.
-
Web services making the link between applications
3
Proposal for a Web Application conception




Binary storage : the file system can be used for this task
Query and access services : servlet with TOMCAT
SparQL endpoint and RDF Storage : Boca ???
Binary indexes access and Binary indexes : like the speech indexes.
4
Proposal for a Web Application conception
3. Detailed module Architecture
5
Proposal for a Web Application conception
The Archivist application, the Advanced client application and the Web
client application are the applications on the client side. All these applications
don’t use the same server, for conceptual reasons.
The Access service is really the main server. He provides only some generated
XML files to the client (according to the client request). To use this server in a
web context, we must translate the XML file in a HTML file, or similar. For that,
we have create a buffer server, the Web access service. This server forwards
the web client request to the Access service and translates the Access service
XML result to a HTML result, useful by the web client. It’s possible to merge
Access service and Web Access service, but to a better understanding, we have
split these 2 servers.
The Feature extractor service is dedicated to the archivist application. This
server provides some functionalities to upload a file, extract meta data, update
the RDF storage, etc.
We propose that each services are made with servlets.
All these servers can use some other sub-servers. We have designed, for now all
these sub-servers, like a web service based on SOAP protocol. It’s a proposition.
It’s possible to use servlets. We speak about web service, but HTTP protocol or
others can be understood. The only constraint is: all these servers must be used
on the same machine, in the same time, in the same Apache TOMCAT web
container. TOMCAT is the best choice, because it’s very stable and a lot of actual
work uses TOMCAT (SILOGIC work, DERI work).
All these sub-servers are :
 Binary storage WS: This sub-server provides some functionalities to
store the audio materials, or the videos, or the pictures, or other.
 RDF storage WS: This sub-server can add RDF triplets in the RDF
storage. It provides some functionalities to do that. It should use a
standard API (if any exist) addressing the RDF DB (like Jena, or Sesame
?). This should enuse that EASAIER can work with several RDF DB.
 Speech WS: this one, encapsulates ALL functionalities about speech
retrieval.
 SparQL endpoint: this one is generally directly provided by the RDF data
base. It must be used to query the RDF storage.
 Web related media WS: this sub-server is the DERI server about web
aggregation. It can aggregate some other links come from the Internet.
 Query by example WS: This one is to be defined,
The Binary storage can be implemented with the local file system. This solution is
simple and efficient.
7
Proposal for a Web Application conception
4. Music Workflow
We present here an example of the workflow when a user queries some sound tracks. It
details the modules being used from the reception of the query to the return of the result.
We assume that the indexation (or feature extraction phase) has already be done and
focus only on the retrieval.
8
Proposal for a Web Application conception
5. Speech workflow
We present here an example of the workflow when a user queries speech. It details the
modules being used from the reception of the query to the return of the result. We assume
that the indexation (or feature extraction phase) has already be done and focus only on the
retrieval.
9
Proposal for a Web Application conception
6. Local related media workflow
We present here an example of the workflow when a user retrieves related media. It
details the modules being used from the reception of the query to the return of the result.
We assume that the indexation (or feature extraction phase) has already be done and
focus only on the retrieval.
10
Proposal for a Web Application conception
7. Web related media workflow
We present here an example of the workflow when a user retrieves related media from
other sources (like MusicBrainz…). It details the modules being used from the reception
of the query to the return of the result. We assume that the indexation (or feature
extraction phase) has already be done and focus only on the retrieval.
11
Proposal for a Web Application conception
8. Stream file workflow
This is the simple workflow of retrieving binary content.
12
Proposal for a Web Application conception
9. Servers Interfaces
In this section we want to detail the interfaces that each service implements. As being
said before these interfaces can implemented as Web service approach or simple http
style (http://www.server/services?param=value…)
9.1.
Speech WS
cd System
«interface»
Speech WS
+
+
+
SpeechPreprocess(ToBeDefine) : void
SpeechRetrieve(String[], int, int) : int
SpeechTrain(ToBeDefine) : void
Speech WS, is a web service, accessible by all the EASAIER servers. This web service
provides some functions (services), necessary to use the ALL functionalities. This web
service exposes, for now, 3 functions. One is useful to the Access the service, the others
are useful for the Feature extractor service.
Implementation as a Web Service
SpeechRetrieve(String[] words, int maxHits, int minConfigPercent) returns int.
This function is useful for the Speech class in the Access service. SpeechRetrieve has 3
parameters.
 The first one, words, is a tab of String, and it’s the set of words that the user wants
to find in the audio files (according to the speech indexes).
 The second one, maxHits, is the maximum number of hits per word expected.
 The last one, minConfigPercent, gives the lower limit of the confidence
percentage of the hits.
The return is an int. If the retrieval algorithm finds (or not) these words, the function
return 0. If the retrieval algorithm crashes, the function returns –1 for example. In all
cases, Access service wants a response, to start the next step.
SpeechPreprocess(to be define) returns to be define
This function is to be defined.
It’s useful to the Feature extractor service, to add a new file in the speech indexes.
SpeechTrain(to be define) returns to be define
This function is to be defined.
It’s useful to the Feature extractor service, to train the system.
13
Proposal for a Web Application conception
Implementation as an Http servlet
In this case the servlet should responds to the following queries:
SpeechRetrieve
http://www.easaier.org/speechWS?action=SpeechRetrieve&word1=Flower&word2=Sum
mer&maxHits=50&minConfigPercent=60
In all case the service should answer 200.
SpeechPreprocess
to be defined
SpeechTrain
to be defined
9.2.
RDF Storage WS
to be defined
9.3.
Web related media WS
to be defined
….
14
Download