CS598 CXZ Assignment 1 Hong Cheng, hcheng3 Web Faculty homepage classification is an important problem in web domain. The problem is, to classify the faculty homepages from different universities according to their research field. If a student doing data mining wants to apply for the graduate program in US universities, he can input “data mining, U.S., university”. The search result is the data mining faculty homepages from different universities in U.S. This would help a lot since people currently have to navigate to different university websites, go to the “faculty” list, and click on every faculty name to find out whether his interest is data mining or not. The challenge is how to summarize the homepages and classify them correctly. Email Automatic email replying is an important problem in Email domain. If this problem is solved well, information service or technical support persons will benefit a lot. The data involved is emails. The task is to automatically generate replies for incoming emails. The major challenge involved is, how to classify or summarize the incoming emails correctly. Emails from different persons will have different writing styles, even if they are about the same problem. Another interesting problem is, a user could possibly send a series of emails about different stages of a problem. It may help by taking into consideration the same user’s previous incoming emails to generate an automatic reply for the current incoming email. Literature Paper classification and organization is an important problem. The problem is to classify published papers in CS domain into different sub areas, and organize them in the time order. The current situation for a researcher is, if he wants to know what has been done or not been done in a field, he has to search on the web in an ad-hoc way. It is easy for someone to miss some important publications by searching in this way. If this task is done, researchers who want to do literature survey in a specific area will benefit a lot. For example, if a data mining person wants to know what has been done on frequent pattern mining. He can input “frequent pattern mining” and all the relevant papers are output in the time order. Then he can do the literature survey very easily. The major challenge is, how to summarize and classify the papers correctly. And if a paper is an interdisciplinary one, we should assign it to every related field.