Use Cases and Interaction Model

advertisement
Pushing the Quality Level in Networked News Business
semantic-based content retrieval and composition
in international news publishing
Markus Schranz
schranz@infosys.tuwien.ac.at
Agenda
• Problem and Project Description
– Goals and Objectives
• Approaches and Results
– Architectural Design & Communication
– Multinational and Multilingual Services
– Semantic Content Relations
• Future Steps and Exploitation
problem description
Environmental Situation
• Internet gains in importance in the news distribution area
• Large amount of distributed business information is available
• European business today is highly segmented and widely
unrecognised beyond national borders
• Business news mostly bear national relevance but hold the
potential to spread cooperation opportunities and business chances
towards an economically and socially integrated Europe.
•
Business is global  news need to be
Support for old and new economy within entire Europe is required;
Appropriate solution beneficial for business in the EU, with special focus on
the support and the integration of new member states
problem description
Existing approaches
• National solutions available
– Business News Distribution Service in German
speaking area
– Increasing interest from both
• Subscribers
• Press distributors
within the existing services for multinational solutions
• Limitations
– Single language limitation
– Not attractive for European companies to join
project description
Objectives
• NEDINE has been EC-funded (Apr 2004-Apr2006). The objective of
the project is to establish a distributed news network, aimed at
European journalists and opinion leaders.
• NEDINE provides participants with a network for news exchange and
distribution. It supports mutual awareness of relevant topics and
information content within all European countries.
• NEDINE focuses on the availability and affordability for all partners to
transport national information to the addressed target group,
regardless of the origin, nationality and financial capability of the
information provider.
project description
The Challenge
Austrian NA
Austrian reader
Czech NA
Czech reader
Slovakian NA
Slovakian reader
Small Company
Good product
News agency offers to its
project description
readers:
The Solution
News agency offers to its
customers:
- Single access point for
international press releases
- Distribution
- Payment
- Editing / Translation
- Price advantage compared
to collection of single press
releases
Austrian NA
- Multilingual news
- International news
- From various sources
- (Semantic) Relationships
independent from source
-Austrian
Relevance
ranking for
reader
search
Czech NA
Czech reader
Slovakian NA
Slovakian reader
News agency benefits from
the nedine network:
Small Company
Good product
- Common business model
- Additional customers
- more revenues
- new contacts
- international presence
approaches and results
Architecture Reasoning
First Approach – Centralized Architecture
Pro‘s:
•
•
•
•
Single maintenance point
Clear infrastructure
One traffic channel (News agency  NEDINE)
No additional infrastructure required for Partners
Con‘s:
•
•
•
•
Single point of failure (whole network down)
Huge amount of network traffic
Storage of complete articles
Which organization maintains the central server?
approaches and results
centralized configuration
ČIA
SITA
Web Service Interface
NEDINE
Central Server
PTE
approaches and results
Architecture Reasoning
Alternative Approach – Hybrid P2P - Architecture
Why Peer - to - Peer?
•
•
•
•
•
Better scalability
No single point of failure
No downtime if central services are down
Less network traffic
Network remains transparent for the peers
(they only see Nedine)
approaches and results
Final Approach – Hybrid P2P - Architecture
Properties of this Architecture:
• Democratic System
• Identical software components are installed at each
partner
• Nedine becomes a logically centralized platform
• Nedine is technically distributed to the view of all
participating peers
• Semantic relations and necessary steps for news
distribution are done in a local context
approaches and results
P2P configuration
NEDINE
Peer
ČIA
SITA
NEDINE
Peer
PTE
NEDINE
Peer
Web Service Interface
Virtually
Central
Services
approaches and results
Communication: Peer  Agency
Web Services as the communication protocol
• Standard Interfaces for default peers (SOAP,
NewsML Data transfer, Queries, Network Data)
• Customized interfaces for each partner, if necessary
(database access based on document ID)
• Location and functionality of the NEDINE-peer is
defined in the corresponding WSDL-file
• Functionality is only visible by the local peer, which
increases network security
approaches and results
Inter - Peer - Communication
Implemented also by XML Web Services
• Inter – peer communication is invisible to the
agencies
• High flexibility, easy to upgrade/change – doesn’t
influence the rest of the network
• Network traffic is encrypted via PKI (Private-PublicKey Infrastructure)
approaches and results
Multinational and Multilingual Services
– Multinational Service Integration
• Standardized news exchange formats  NewsML
• Local Service to Peer communication  SOAP
– local service providers hold business critical information
– installation of a local peer with well-known (open) source
increases trust of the participating organizations and
underlines the local character of the relevant business data
• Peer-to-Peer communication SOAP
approaches and results
Multilingual News Publishing and Distribution
– Automatic Translation ?
– Multilingual content presentation ?
– Multilingual information distribution & retrieval
– Semantic relations between the (multilingual)
business news contents
approaches and results
Semantic News Enrichment
Pushing the Quality Level by Semantics
– International news describe local business and
lack relevant interrelations
– “Linking” between sensible business news has
been manual work and thus costly
– Semantic relationships increase business value
of news items, but how to create with
reasonable effort?
approaches and results
The Vector Space Engine
– Vectors are assigned to every news article
representing keyword occurrences (weights)
– Vectors are technically small portions of data,
feasible to integrate in peer component
– Semantic relationships increase business value
of news items
• Automatically recognize similarities by creating a
vector space on relevant keywords
approaches and results
• What is a keyword?
all words (except stopwords)
relevant words
• from frequencies
• with weights (vector space model)
• from the domain
• How does a keyword look like?
 A word : bodies
 A stem : bodi
 A lemma : body
 A phrase : public bodies
approaches and results
Document
Query
Query
Processing
Query
- Stemming and/or
- PN Detection and/or
- N-Gram Detection …
Q = (wq1,…,wqn)
Matching
D = (wd1,…,wdn)
Document
Processing
- Stemming and/or
- PN Detection and/or
- N-Gram Detection …
Document
approaches and results
• Vector Space Model combined with
statistic and linguistic processing.
• Statistical metrics included are:
– tfij = Term frequency for word i in document j
– IDFi = Inverse Document Frequency for word i
in the whole document collection
N
IDFi = 1 + log 2 
 dfi
– wij = tfij *IDFi



N = Total documents
dfi = Document Frequency
for term i
approaches and results
Vector Space Model
• Documents are indexed by vectors
• Documents are retrieved by similarity
– Query and Documents are compared using the
n
cosine formula:
 wq .wd
Sim(Q,D) =
i
i 1
n
i
n
 wd . wq
i 1
2
i
i 1
2
i
– Local archives must provide term frequency data
(internal and document)
approaches and results
The used model
NEWS
Statistical
process
Preprocessing
of texts
Metadata
information
Linguistic
Processing
Taggers
and
Stemmers
Proper
Names
Heuristics
Syntactic
patterns
Document
Vectors
Semantic
resources
(EWN)
approaches and results
NEDINE
Peer
1. Distribution &Enrichment
ČIA
(CZ,DE,EN)
NEDINE
Peer
SITA
(SK,EN)
2. Enrichment (DE)
Subscriber
3.
4.
5. CZ,DE
NEDINE
Peer
PTE
(DE,EN)
6.
7. DE
Subscriber
Use case: distributing news in
Czech republic and in Austria
ČIA CZ, DE
Future Exploitation
Recent developments and open issues
• Nedine has been extended with translation
services (additional service on P2P
architecture)
• Secure communication infrastructure has
been implementation
• Performance and scalability tests
• Market & Business orientation
 Nedine Association has been funded end 2005
Have a look at NEDINE, we are
open to recommendations, news providers
and partners from all over Europe.
Website http://www.nedine.org/
E-Mail info@nedine.org
Nedine Contact Person: Dr. Markus Schranz
Tel. ++43-1-81140-444, schranz@pressetext.at
Good News from Europe
Download