An HTML-WML Translator base on

advertisement
Subproject 4: HTML-WML
Transcoding System
Jia-Shung Wang
Computer Science Department
National Tsing Hua University
March 27, 2001
Outline
• Motivation and Issues
• Examples of Transcoding
• System Overview and Translation
Flow
• Some HTML to WML Conversion
Strategies
Information Appliances
• Different design constraints based on intended
use, enhances ease of use
–
–
–
–
–
–
–
–
Desktop PC
Mobile PC
Desktop “Smart” Phone
Mobile Telephone
Personal Digital Assistant
Set-top Box
Digital VCR
…
• Implications:
– Shift from computer design to consumer design
Motivation
 Rapidly growing diversity of wireless
communication devices
The incredible growing of the amount of
available HTML web pages on the Internet
Solutions for mobile devices with WML
browsers to access the existing HTML or
WML pages on the Internet.
Issues
Device-enabled service for WML
mobile devices with different types
of screen
Bandwidth-driven transmission for
rapid response and fast delivery
speed
The usage of browsing behavior
The resizing of images /icons
The compression of the resulting
WML data
Demos of Transcoding
• Contents from
enYES 鉅亨網
USAtoday
CS, NTHU
NTHU
VOD
Discussions
enYES provides two versions: regular
HTML and WAP to serve PC users and
mobile device users separately.
USAtoday also provides content (simplified
version) for users with Palm.
NTHU, CS-NTHU homepages:If we keep
the original figure for saving the link
information, then the page layout becomes
old. (using HTML browser with:Browse-It).
VOD homepage, one-column text: no
significant difference after transcoding.
Usage of Browsing Behavior
The automatic translation seems
complicated because of the diversity of
content posted on an HTML page.
It is unlikely to have a universal conversion
strategy to translate every HTML page to
sequences of WML decks effectively.
However, it seems a good idea to
categorize the browsing behavior to
classify the HTML page to be translated
first.
Usage of Browsing Behavior
(cont’d)
After doing that we may realize what the
client requires. Then we can have a
corresponding conversion to extract the
acquired content step-by-step and
translate them into some predictable and
small sized WML documents.
We believe that there would be some
adequate conversions for some kinds of
web pages after classification.
Related Works
Transcoding Proxy of IBM alphaWorks
It has a goal to manager different
version of contents with different
fidelities and modalities in order to
adapt the delivery to different client
device.
Related Works
Intel Quick Web Technology
• New software capability that helps
Internet providers and digital
distribution companies increase the
delivery speed of Web pages
containing photos, drawings and other
graphics.
• It uses two key techniques,
“Compresses” and “Caches”.
Related Works
Spyglass Prism
• Spyglass Prism dynamically adapts
Web content to match various non-PC
devices.
• It functions as a proxy server, caches
the converted content, and
dynamically converting standard
HTML to WML.
Related Works
Proxy Architecture for Efficient Web
Browsing over Cellular Networks
• Decreases the access time of
browsing WWW in narrow-band
wireless environment.
• It adopts persistent connection and
pipelining technique based on proxy
architecture to improve the HTTP
process between the client and the
proxy server.
Comparisons between
HTML and WML
• Both make use of tags and attributes.
• Similar character set, syntax and data
types.
• Two special elements of WML structure
– Deck and Card
• Different design goal
– HTML: To Publish hypertext on the World
Wide Web
– WML: For narrow network bandwidth devices
with small displays, limited memory and fewer
computational resources.
Examples of HTML and WML
WML
<wml>
<deck>
<card>
<p>
<do type="accept">
<go href="#card2"/>
</do>
This is the first card...
</p>
</card>
<card id="card2">
<p>
This is the second card.
</p>
</card>
</deck>
</wml>
HTML
<html>
<head>
<title>
Example page.
</title>
</head>
<body>
<h1>
This is a headline.
</h1>
<p>
This is a paragraph.
</p>
</body>
</html >
System Overview
Client
WML
WML
Browser
Etc.
Web Server
Translation Server
WAP
HTTP
HTML, WML
HTML Parser
HTML-WML
Translator
WML Generator
HTTP
Documents
Multimedia
Content
CGI
Scripts
etc.
Features
• An HTML-WML Translator on the
Translation Server
• Both HTTP and WAP requests are
acceptable.
• Java Servlet API compatible
• Server- and platform-independent
Translation Server:
Components
and Flow
Request
Request
Network
Protocol
Proxy
Response
Response
Link
Builder
Decks &
Cards
WML
Generator
HTML
Parser
Filter
Document
Analyzer
Components
• Gateway
– Accept requests from clients
– Return appropriate responses
• Proxy Servlet
– Get the requested remote documents
– Determine to pass or convert
– Cache the converted results
Components (cont’d)
• HTML Parser
– Parse the HTML document as a parse
tree
• Document Analyzer
– Analyze the parse tree
• Filter
– Filter any objects unnecessary or not
supported by the client device
– Image/icon resizing
Components (cont’d)
• Content Divider
– Split a document into multiple, smallsize documents
• Link Maker
– Insert extra links to make small
documents reach one another
• WML Generator
– Produce well-formed WML documents
and return them to Proxy Servlet
HTML to WML
Conversion Tools
• Semi-automatic:
– Used for rich HTML documents
– The conversion form is designated manually
with the help of analysis and editing tools.
– The resulting forms are distributed to the
gateway servers.
• Automatic:
– Used for simple documents, such as News
and BBS, …
HTML to WML
Conversion Strategies
• Strategy I: Tables to Lists
– Simply removing all layout elements such
as table
– Let all the contents arrange into only one
column with a fixed width
• Strategy II: One Table One Deck
– Extracting each table to form a deck
HTML to WML
Conversion Strategies (cont’d)
• Strategy III: Preview First
a. One Table One Deck
b. Collect all the first card of every deck
as preview cards
c. Arrange these preview cards to form an
preview deck, which will be transmitted
first, every preview card will have a link
to its corresponding deck
Original Document
<content 1_1>
<table>
<document>
<section 1>
<content 1_2>
<content 2_1>
<content 2_2>
<section 2>
<content 2_3>
<content 2_4>
<content 2_5>
<table>
<content 3_1>
<content 3_2>
<content 3_3>
<table> < section 3>
<content 3_4>
<content 3_5>
<content 3_6>
< section 4>
<content 4_1>
<content 3_7>
Tables to Lists
<document>
<deck>
<content 1_1>
<content 1_2>
<content 2_1>
<content 2_2>
<content 2_3>
<deck>
<content 2_4>
<content 2_5>
<content 3_1>
<content 3_2>
<content 3_3>
<deck>
<content 3_4>
<content 3_5>
<content 3_6>
<content 3_7>
<content 4_1>
One Table One Deck
<content 1_1>
<deck>
<deck>
<document>
<deck>
<deck>
<content 1_2>
<content 2_1>
<content 2_2>
<content 2_3>
<content 2_4>
<content 2_5>
<content 3_1>
<content 3_2>
<content 3_3>
<content 3_4>
<content 3_5>
<content 3_6>
<deck>
<content 3_7>
<content 4_1>
Preview First
<deck>
<document>
<deck>
<content 1_1>
<content 2_1>
<content 3_1>
<content 4_1>
<deck>
<deck>
<content 1_2>
<content 2_2>
<content 2_3>
<content 2_4>
<content 2_5>
<content 3_2>
<content 3_3>
<content 3_4>
<content 3_5>
<content 3_6>
<deck> <content 3_7>
Strategy Evaluation
• Assuming we have S sections in a
document and the document is
translated to N WML cards.
• Every deck contains at most C cards.
• Assuming that the contents in the
same tables are similar.
Evaluation of Searching
After Translation
Tables to Lists
One Table
One Deck
Preview First
User Friendly
Worst
Best
Good
Average Deck
Access Time
N/2
S/2
S/2C
Performance Evaluation
HTML Pages
Reduction
Headers
Text
WML
Images Decks
Without With
(bytes) (bytes)
Images Images
Experiment #1
24,359
9,471
176,361 7,440
22.0%
3.5%
Experiment #2
17,937
6,137
126,740 11,232
46.7%
7.4%
Experiment #3
21,203
8,325
280,727 16,891
57.2%
5.4%
Experiment #4
9,568
20,363
17,966
40.3%
25.2%
Source (bytes)
12,062
Performance Evaluation
(Experiment #1: What’s WAP)
Deck 1
WAP Forum
What’s WAP
Preview
Preview
Deck 2
Deck 3
Deck 1
Deck 3.1
Deck 3.2
Performance Evaluation
(Experiment #2: NTHU Web Page)
History
NTHU
Preview
Preview
Current Status
Preview
About NTHU
Preview
Deck 1
Deck 1
Deck 2.1 Deck 3.1
Deck 1 Deck 2.1
Deck 2.2 Deck 3.2
Deck 2.2
Performance Evaluation
(Experiment #3, NTHU CS Web Page)
NTHU CS
Faculty
Preview
Preview
Deck 1
Deck 1
Deck 3.1 Deck 3.3 Deck 3.5
Deck 3.2 Deck 3.4 Deck 3.6
Performance Evaluation
(Experiment #4, IETF Web Page)
IETF
Internet-Drafts
Internet-Drafts Index
DNSOP
Preview
Preview
Preview
Preview
Deck 1
Deck 1 Deck 2.1
Deck 2.2
Deck 1
Deck 2.1
Deck 1
Deck 2.1
Deck 2.4 Deck 2.2
Deck 2.4 Deck 2.2
Deck 2.5 Deck 2.3
Deck 2.5 Deck 2.3
Implementation
Goal: Portability, reusability, and
crash protection.
Translation server: under Java
environment with Java Servlet, Java
HTML Tidy, and XML Parser for Java.
Servlet-enable server: Avenida Web
Server and Nokia WAP Server
Microsoft Windows NT Workstation
4.0 with Service Pack 5
Summary
• Design an HTML to WML
transcoding system with
1. Analyzing and filtering HTML contents
2. Image/icon resizing
3. WML browsing mode design and WML
conversion tool
4. compression and decompression modules
of the WML data.
5. WML transmission control
Download