COMP420: Foundations of Chinese Computing

advertisement
COMP341: Multilingual Computing
Programming Assignment 2:
Chinese Webpage Analyzer
Size of group: 2 students per group.
Due day: on or before 24 April before 4:30pm, 2006. No postponement will be
given because the demo has to start on 25 April. If you cannot hand-in your project on
time, you have to arrange demo with the TA individually.
Grading policy: The assignment will be graded based on (1) correctness, (2) clarity
in documentation (3) interface design (4) programming logic
Objective:
In this programming assignment, you will learn how to develop a multilingual
character analyzer.
Brief Description
You are require to develop a webpage analyzer that user could input an URL and the
analyzer program could retrieve the web content and process it instantly.
Your system should be able to carry out:
1) Auto-detect the page character coding (Big5 / GB or UTF-8 , etc.)
2) For analyzing the page content, the program needs to convert this into Unicode
(UTF-8). Based on the code range, the system could analyze the different types of
characters and their statistics which includes the frequency of occurrence of
characters and the character details according to character subset(internal code and
the form of another coding).
3) Users can also specify the analyzed result according to query options:
i.
Number of characters to be displayed (Most frequent character/ Least
frequent character, say the most frequent 10 characters, least frequent 100
characters)
ii.
Specific type: English, Chinese, or according to the specified code ranges.
The code ranges can be identified according the subset definitions (which
you can find in Microsoft Word Symbol tool under “Arial Unicode MS”
font set.
4) Programming Language: Any
What to hand in:
 5 minutes demo.

The source code of the program.

A short written report (typically less than 15 pages) summarizing the following:
 Methodology and algorithm used for design;
 Technical Documentation/Specification for the localize date file (such as
standard, format)
†Note1- Grouping policy: Students can freely find partners, and they need to inform
the TA by 13 April(the 12th week when students are doing Assignment 1 Demo). It is
required that both students in the same group must be present at the assignment
demonstration together because we need to confirm both have actually participated in
the project and understand the details. Therefore, it is strongly advised that two
students are in the same lab group.
‡ Note2: Late hand in policy: As April 17 is a holiday and there is no lecture, you can
hand in the assignment anytime in the previous week to either Prof. Lu Qin or the TA.
Or you can attend the tutorial on April 18 and hand in the project to Prof. Lu Qin( on
or before the 1:30). If you hand it in after 1:30pm after the tutorial has started, it
would be considered one day late. The penalty for late hand in is 15% each day
(1:30pm as the cut off time), and there will be no grade after 7 days. Students who
cannot complete the project on time, are still encouraged to hand in their work so that
the TA can help you to check whether you have a good understand of the assignment.
Download