CONCLUSION AND FUTURE ENHANCEMENTS Conclusion The work completed involves the implementation of the centroid and KNN classifiers using four kinds of document representation. The project classifies the documents present in the corpus into different classes based upon the query given by the user. Among all the classifications based upon different document representations the term frequency representation when combined with any of the implemented classifiers, gives the optimal result, where as the least classification is obtained when the probabilistic representation is used. Future enhancements To increase the number of classes To build a suitable front end To integrate the classifiers built to the search engine to provide classification of websites To enhance the centroid classifier by implementing weighted centroid classifier. To incorporate a stemming algorithm ex. Stemmer porter. To upgrade the implementation to incorporate the standard data collections, such as, Reuters-215758, TREC-5, TREC-6 and OHSUMED collection, 20 news group data set. 58 REFERENCE [1].Ricardo Bayeza-yates, Berthier ribeiro Neto, “Modern Information Retrieval”, Addison-Wesley-Longman Publishing co., 1999 [2].Spoerri, A, “Information Processing & Management”. Proceedings of the IEEE First International Conference on Computer Vision. Volume 43, pp. 1044-1058, 2007 [3]. Forrester, “Coping with complex data”, The forrester Report, pp.2-4, April 1995. [4]. W. Bruce, “intelligent Information Retrieval”, Croft Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts, Amherst Amherst, D-Lib Magazine, November 1995 [5] Simon Colton, ” AI Bite”, The Society for the Study of Artificial Intelligence and Simulation of Behaviour, pp.66-67, [6] Nils J. Nilsson , “Introduction to machine learning”, Robotics laboratory Department of computer science Stanford University Stanford, [7] Faloutsos, C. Oard, D.,”A Survey of Information Retrieval and Filtering Methods” Technical Report CS-TR 3541, University of Maryland, pp.3-4, 1995. [8] Hull, D., Pedersen, J. and Schutze, H. “Document routing as statistical classification” In AAAI Spring Symp. Machine Learning in Information Access Technical Papers, Palo Alto, March 1996. [9] Lewis, D. D., Ringutte, M., “A comparison of two learning algorithms for text categorization” . In Third Annual Symp. on Document Analysis and Information Retrieval, Las Vegas, NV, pp. 81–93, 1994 . [10] Schutze, H., Hull, D. and Pedersen, J., “A comparison of classifiers and document representations for the routing problem”, In Proc. SIGIR, pp. 229–237. 1995. 59 [11] Susan Gauch, Aravind Chandramouli, and Shankar Ranganathan, ”Training a Hierarchical Classifier UsingInter-Document Relationships”, August 2006, [12] MacLeod, K. “An application specific neural model for document clustering”,. Proceedings of the Fourth Annual Parallel Processing Symposium, Vol.1, pp. 5-16, 1990. [13] Joachims, Thorsten. “Text Categorization with Support Vector Machines: Learning with Many Relevant Features. Machine Learning: ECML-98. 10th European Conference on Machine Learning, p. 137-42 Proceedings. 1998. [14] Svingen, B. Using genetic programming for document classification. FLAIRS-98. Proceedings of the Eleventh International Florida Artificial Intelligence Research, p. 63-67, 1998. [15] Benkhalifa, M., Bensaid, A. and Mouradi, “A. Text categorization using the semisupervised fuzzy c-means algorithm”, 18th International Conference of the North American Fuzzy Information Processing Society - NAFIPS, p. 561-5, 1999. [16] Lam, Wai and Low, Kon-Fan Automatic document classification based on probabilistic reasoning: Model and performance analysis. Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, Vol.3, p. 2719-2723, 1997. [17] Nigam, Kamal; Maccallum, Andrew Kachites; Thrun, Sebastian and Mitchell, Tom. Text Classification from Labeled and Unlabeled Documents using EM. To appear in the Machine Learning Journal 1999. Draft. [18] Salton, G., Mcgill, M. J. 1983. Introduction to Modern Information Retrieval. McGraw-Hill, New York, NY. 60 [19] Adam Berger, John Lafferty,” Information retrieval as statistical translation”, 1999, In Proc. of the 22nd ACM SIGIR. [20] Salton, Gerard.,”Automatic Text Processing: the transformation, analysis, and retrieval of information by computer”, 1989., Addison-Wesley, Reading, MA. [21] Zdravko Markov, Daniel T. Larose, “Data mining the Web: uncovering patterns in Web content, structure and usage”, Wiley-Interscience, pp. 16-20, 2007, [22] M Mahdi Shaei, Singer Wang, “A Systematic Study of Document Representation and Dimension Reduction for Text Clustering”, Dalhousie Computer Science Technical Reports, Dalhousie Univeristy, Halifax, Nova Scotia Canada, 2006, [23] D. Blei, A. Ng, and M. Jordan, "Latent Dirichlet allocation", Journal of Machine Learning Research pp. 993–1022, 2003. [24] Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, “Introduction to Information Retrieval”, Cambridge University Press., pp.1, 2008. [25] Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, “Introduction to Information Retrieval”, Cambridge University Press., pp.118, 2008. [26] Vannevar Bush; "As We May Think," Atlantic Monthly 176 (1) pp. 101-108 (1945). [27]. Warren Weaver; "Translation," W. N. Locke and A. D. Booth, John Wiley, New York, pages 15-27, (1955). Reprint of 194D9 memo. [28]. Cyril Cleverdon; "Progress in documentation, evaluation tests of information retrieval systems," J. Documentation Vol. 26 (1) pp. 55-67, March 1970. [29]. C. W. Cleverdon, J. Mills, and E. M. Keen, “Factors Determining the Performance of Indexing Systems “ ASLIB Cranfield Research Project,1966 [30]. Will Hill, Larry Stead, Mark Rosenstein, and George Furnas; "Recommending and Evaluating Choices in a Virtual Community of Use", Denver, Colorado, pp. 194-201 1995. [31] G. Salton “Automatic Information Organization and Retrieval “ ,McGraw-Hill, pp.240-241, (1968). 61 [32] R. C. Schank, and K. Colby, “Computer Models of Thought and Language”, W. H. Freeman, pp-120-122, 1973.. [33 ]. A. L. Samuel; "The banishment of paperwork," New Scientist Vol,21, pp. 529-530 (27 February 1964). [34]. Hedvah L. Schuchman “Information transfer in engineering Report” , The Futures Group , pp. 461-46-27, 1981. [35] F. David Peat, “Artificial Intelligence - How Machines Think”, Baen Enterprises, 1985. [36]. Ed Fox; "Special issue on digital libraries.," Communications of the ACM, New York, April 1995. [37] Eui-Hong(sam) Han, George Carry Pis, “Centroid Based Document ClassificationAnalysis and Experimental Results”, University of Minnesota, Dept of computer science/ Army HPC Research centre. 2-3,1989. [38] Nicholas Kushmerick, Edward Johnston and Stephen McGuinness, “Information Extraction By Text Classification”, The IJCAI-2001 Workshop on Adaptive Text Extraction and Mining, 2001 62