codesummaryq3(2)

advertisement
COMPUTER SYSTEMS RESEARCH
Code Writeup, 3rd quarter 2007-2008
1. Your name: Felix Zhang, Period: 5
2. Date of this version of your program: 4/2/08
3. Project title: Development of a German-English Translator
4. Describe how your program runs. List test input(s) that may be used. Are there incorrect user
input(s) that your program handles?
The program goes through a series of methods, with each one building on the information established in
the previous method - part of speech tagging, morphological analysis, noun-verb agreement,
lemmatization, translation, chunking, priority assigning, and inflection.
As of right now, the program can handle incorrect user inputs up to the translation portion of the program,
in which "No translation found for X" is outputted, if X is unable to be found in the dictionary.
input = "den Mann machen die kleinen Kinder"
This German sentence means, "the small children make the man", though the direct object and subject
have switched positions (word-for-word, it says, "the man make the small children", which is acceptable in
German, but not in English), so that I can test whether my sentence structure rearrangement works
properly to put the subject in the right place in English.
5. What is the program analyzing as far being used for your senior research project this year?
Functionally, the program analyzes groupings of words, generating all the information outlined above.
Overall, the point of the program is to analyze the reliability of rudimentary translation methods, and how
much accuracy it sacrifices for the sake of simple techniques.
6. How has your program evolved during third quarter? What do you expect to be your final version
by the end of this school year, by the end of this school year, what do you hope to have as a final
version of your program in relation to this current version? What will you demonstrate during
your final presentation?
The program can now resolve almost all word ambiguities, and devises a solution for words that are
still ambiguous as to meaning. More importantly, the program is now implementing a rudimentary (very
rudimentary) grammar structure system, with which I can translate a German sentence to something that
is actually meaningful (grammatically correct) in English.
At the end of the year, I hope to have a system that will be able to output a grammatically correct English
sentence given a simple German sentence as input, while reducing word ambiguities to a minimum level.
7. What will be the major research points you’ll write about for the final version of your research
paper?
One point I will be making in my paper are the differences between the two well-established methods of
machine translation – Statistical and rule-based, and compare their accuracies and efficiency. Since I am
mainly implementing a rule-based system, I will be highlighting the step-by-step process needed for
translation, and various errors encountered. I will also be testing the reliability of rudimentary statistical
methods, such as part of speech tagging, and including the results in my paper.
Download