Hong Cheng

advertisement
CS598 CXZ Assignment 1
Hong Cheng, hcheng3
Web
Faculty homepage classification is an important problem in web domain.
The problem is, to classify the faculty homepages from different universities
according to their research field. If a student doing data mining wants to apply for the
graduate program in US universities, he can input “data mining, U.S., university”. The
search result is the data mining faculty homepages from different universities in U.S.
This would help a lot since people currently have to navigate to different university
websites, go to the “faculty” list, and click on every faculty name to find out whether
his interest is data mining or not.
The challenge is how to summarize the homepages and classify them correctly.
Email
Automatic email replying is an important problem in Email domain.
If this problem is solved well, information service or technical support persons will
benefit a lot. The data involved is emails. The task is to automatically generate replies
for incoming emails.
The major challenge involved is, how to classify or summarize the incoming emails
correctly. Emails from different persons will have different writing styles, even if they
are about the same problem. Another interesting problem is, a user could possibly
send a series of emails about different stages of a problem. It may help by taking into
consideration the same user’s previous incoming emails to generate an automatic
reply for the current incoming email.
Literature
Paper classification and organization is an important problem.
The problem is to classify published papers in CS domain into different sub areas, and
organize them in the time order.
The current situation for a researcher is, if he wants to know what has been done or
not been done in a field, he has to search on the web in an ad-hoc way. It is easy for
someone to miss some important publications by searching in this way.
If this task is done, researchers who want to do literature survey in a specific area will
benefit a lot. For example, if a data mining person wants to know what has been done
on frequent pattern mining. He can input “frequent pattern mining” and all the
relevant papers are output in the time order. Then he can do the literature survey very
easily.
The major challenge is, how to summarize and classify the papers correctly. And if a
paper is an interdisciplinary one, we should assign it to every related field.
Download