[1] - XP-Dev.com

advertisement
CONCLUSION AND FUTURE ENHANCEMENTS
Conclusion
The work completed involves the implementation of the centroid and KNN
classifiers using four kinds of document representation. The project classifies the
documents present in the corpus into different classes based upon the query given by the
user. Among all the classifications based upon different document representations the
term frequency representation when combined with any of the implemented classifiers,
gives the optimal result, where as the least classification is obtained when the
probabilistic representation is used.
Future enhancements

To increase the number of classes

To build a suitable front end

To integrate the classifiers built to the search engine to provide classification of
websites

To enhance the centroid classifier by implementing weighted centroid classifier.

To incorporate a stemming algorithm ex. Stemmer porter.

To upgrade the implementation to incorporate the standard data collections, such
as, Reuters-215758, TREC-5, TREC-6 and OHSUMED collection, 20 news group
data set.
58
REFERENCE
[1].Ricardo Bayeza-yates, Berthier ribeiro Neto, “Modern Information Retrieval”,
Addison-Wesley-Longman Publishing co., 1999
[2].Spoerri, A, “Information Processing & Management”. Proceedings of the IEEE First
International Conference on Computer Vision. Volume 43, pp. 1044-1058, 2007
[3]. Forrester, “Coping with complex data”, The forrester Report, pp.2-4, April 1995.
[4]. W. Bruce, “intelligent Information Retrieval”, Croft
Center for Intelligent
Information Retrieval Computer Science Department University of Massachusetts,
Amherst Amherst, D-Lib Magazine, November 1995
[5] Simon Colton, ” AI Bite”, The Society for the Study of Artificial Intelligence and
Simulation of Behaviour, pp.66-67,
[6] Nils J. Nilsson , “Introduction to machine learning”, Robotics laboratory Department
of computer science Stanford University Stanford,
[7] Faloutsos, C. Oard, D.,”A Survey of Information Retrieval and Filtering Methods”
Technical Report CS-TR 3541, University of Maryland, pp.3-4, 1995.
[8] Hull, D., Pedersen, J. and Schutze, H. “Document routing as statistical classification”
In AAAI Spring Symp. Machine Learning in Information Access Technical Papers,
Palo Alto, March 1996.
[9] Lewis, D. D., Ringutte, M., “A comparison of two learning algorithms for text
categorization” . In Third Annual Symp. on Document Analysis and Information
Retrieval, Las Vegas, NV, pp. 81–93, 1994 .
[10] Schutze, H., Hull, D. and Pedersen, J., “A comparison of classifiers and document
representations for the routing problem”, In Proc. SIGIR, pp. 229–237. 1995.
59
[11] Susan Gauch, Aravind Chandramouli, and Shankar Ranganathan, ”Training a
Hierarchical Classifier UsingInter-Document Relationships”, August 2006,
[12] MacLeod, K. “An application specific neural model for document clustering”,.
Proceedings of the Fourth Annual Parallel Processing Symposium, Vol.1, pp. 5-16,
1990.
[13] Joachims, Thorsten. “Text Categorization with Support Vector Machines: Learning
with Many Relevant Features.
Machine Learning: ECML-98. 10th European
Conference on Machine Learning, p. 137-42 Proceedings. 1998.
[14] Svingen, B. Using genetic programming for document classification. FLAIRS-98.
Proceedings of the Eleventh International Florida Artificial Intelligence Research, p.
63-67, 1998.
[15] Benkhalifa, M., Bensaid, A. and Mouradi, “A. Text categorization using the
semisupervised fuzzy c-means algorithm”, 18th International Conference of the
North American Fuzzy Information Processing Society - NAFIPS, p. 561-5, 1999.
[16] Lam, Wai and Low, Kon-Fan Automatic document classification based on
probabilistic reasoning: Model and performance analysis. Proceedings of the IEEE
International Conference on Systems, Man and Cybernetics, Vol.3, p. 2719-2723,
1997.
[17] Nigam, Kamal; Maccallum, Andrew Kachites; Thrun, Sebastian and Mitchell, Tom.
Text Classification from Labeled and Unlabeled Documents using EM. To appear in
the Machine Learning Journal 1999. Draft.
[18] Salton, G., Mcgill, M. J.
1983. Introduction to Modern Information Retrieval.
McGraw-Hill, New York, NY.
60
[19] Adam Berger, John Lafferty,” Information retrieval as statistical translation”, 1999,
In Proc. of the 22nd ACM SIGIR.
[20] Salton, Gerard.,”Automatic Text Processing: the transformation, analysis, and
retrieval of information by computer”, 1989., Addison-Wesley, Reading, MA.
[21] Zdravko Markov, Daniel T. Larose, “Data mining the Web: uncovering patterns in
Web content, structure and usage”, Wiley-Interscience, pp. 16-20, 2007,
[22] M Mahdi Shaei, Singer Wang, “A Systematic Study of Document Representation
and Dimension Reduction for Text Clustering”, Dalhousie Computer Science
Technical Reports, Dalhousie Univeristy, Halifax, Nova Scotia Canada, 2006,
[23] D. Blei, A. Ng, and M. Jordan, "Latent Dirichlet allocation", Journal of Machine
Learning Research pp. 993–1022, 2003.
[24] Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, “Introduction to
Information Retrieval”, Cambridge University Press., pp.1, 2008.
[25] Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, “Introduction to
Information Retrieval”, Cambridge University Press., pp.118, 2008.
[26] Vannevar Bush; "As We May Think," Atlantic Monthly 176 (1) pp. 101-108 (1945).
[27]. Warren Weaver; "Translation," W. N. Locke and A. D. Booth, John Wiley, New
York, pages 15-27, (1955). Reprint of 194D9 memo.
[28]. Cyril Cleverdon; "Progress in documentation, evaluation tests of information
retrieval systems," J. Documentation Vol. 26 (1) pp. 55-67, March 1970.
[29]. C. W. Cleverdon, J. Mills, and E. M. Keen, “Factors Determining the Performance
of Indexing Systems “ ASLIB Cranfield Research Project,1966
[30]. Will Hill, Larry Stead, Mark Rosenstein, and George Furnas; "Recommending and
Evaluating Choices in a Virtual Community of Use", Denver, Colorado, pp. 194-201
1995.
[31] G. Salton “Automatic Information Organization and Retrieval “ ,McGraw-Hill,
pp.240-241, (1968).
61
[32] R. C. Schank, and K. Colby, “Computer Models of Thought and Language”, W. H.
Freeman, pp-120-122, 1973..
[33 ]. A. L. Samuel; "The banishment of paperwork," New Scientist Vol,21, pp. 529-530
(27 February 1964).
[34]. Hedvah L. Schuchman “Information transfer in engineering Report” , The Futures
Group , pp. 461-46-27, 1981.
[35] F. David Peat, “Artificial Intelligence - How Machines Think”, Baen Enterprises,
1985.
[36]. Ed Fox; "Special issue on digital libraries.," Communications of the ACM, New
York, April 1995.
[37] Eui-Hong(sam) Han, George Carry Pis, “Centroid Based Document ClassificationAnalysis and Experimental Results”, University of Minnesota, Dept of computer
science/ Army HPC Research centre. 2-3,1989.
[38] Nicholas Kushmerick, Edward Johnston and Stephen McGuinness, “Information
Extraction By Text Classification”, The IJCAI-2001 Workshop on Adaptive Text
Extraction and Mining, 2001
62
Download