Text mining and applications Nour Khalid Khalil, Hajar Faisal Al-Katheery Computer Information Science Dept., Prince Sultan College for women Riyadh Abstract: There has been a large extent of technology used to aim attention at the human natural language such as the speech recognition technology. Text mining is considered one of the technologies that use the human language as an input. In this research more will be explained about text mining like what does it do, its applications And what software uses it. I. Introduction Text mining main activity is extracting meaningful information from texts but it derives to many more purposes to it. StatSoft website [reference 5] states that the purpose of text mining is “to process unstructured (textual) information, extract meaningful numeric indices from the text” like for example counting how much a word is mentioned or how the document is related to other sources. To clarify the purposes more some explanation is lsted as follows: Analysis of natural text is needed nowadays considering the amount of documents and researches and the richness of textual data that’s being used. According to Gartner Group, “almost 90% of knowledge available at an organization today is dispersed throughout piles of documents buried within unstructured text. Books, magazine articles, research papers, product manuals, memorandums, e-mails, and of course the Web, all contain textual information in the natural language form”. A text is words representing paragraphs or sentences. It’s easy for humans to process texts and search for what they want in it if it was at a limited size about a paragraph or two but imagine if you had a whole document with 2 or 3 pages or even much more and you want to see how many words are there, here comes the use of text mining. Therefore a special algorithm was implemented to create text mining software and it differs from one another bases on the purpose. But the main functionality of text mining is handling texts, clarifying the meaning and creates summaries of documents as clarified on (Figure 1). II. What is text mining? Text mining also called by text analytics is as Wikipedia [reference 4] mentions “the process of deriving high quality information from text”. That means the discovery of unknown information from textual data. Figure 1: shows functionality of text mining. 1. First purpose is text mining summarization is decreasing a text into needed information .it’s also called as sentiment analysis such as having lots of book reviews and text mining is used to extract the overall review whether it got a good or bad rates. 2. 3. 4. 5. 6. Next purpose is distilling the meaning .distilling the meaning is the process of purification the information. Text base navigation is tracking information. Topic structured explication is clarifying the topic of the text. Clustering is as described in Wikipedia” a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense”. Semantic information retrieval is retrieving the information in a meaningful way to humans. III. Text mining and searching As Marti Hearst [reference 1] explains that “In search, the user is typically looking for something that is already known and has been written by someone else “while as said in the definition that text mining is looking for unknown information. IV. Data mining and text mining The major difference between the two subjects on data mining is its ability to formulate large quantities of databases into useful analysis. On the other hand, text mining is a smaller division of data mining. It uses terms and unorganized texts to format a database with useful information. Such diversity would make text mining an important step in data mining. V. Applications There are many applications to text mining but all aims to extract useful information from a textual data. One of the interesting ways I saw text mining applied was in social networks where they process the posts to detect what is being talked about. As shown in figure1 Jacobus Van Eeden claims in meme burn website [reference 11] that this graph (figure 2) identifies the most frequent words occurred in popular posts. Figure 2: most occurring words in social media Another fascinating use of text mining is in email filtering. Text mining filters the email messages into spam, unread and so on by finding word that are not possible to appear on a regular massage. The next application of text mining is the ability to analyze an open ended survey responses. Results of an open ended survey is usually paragraphs of describing or opinions and for a researcher it might be hard to analyze thousands of surveys hence using text mining to find particular terms or words that are frequently used to identify for example the pros and cons of a product as said in statsoft website[reference 5]. One of the most used applications in the academic industry such as high schools and colleges is detecting plagiarism. A special algorithm is used for it, the processes start with scanning the paper then looking for similar combinations of words online to detect similarity and highlight what’s found. Finally one of the most famous applications is biomedical text mining. And it’s is as defined in Wikipedia [reference 10] “refers to text mining applied to texts and literature of the biomedical and molecular biology domain”. Biomedical text mining is used due to the large quantity of articles, researches….etc that encounter the medical field to extract the only needed topics. VI. Challenges to text mining Text mining method has some difficulties and issues. Kuan C. Chen from Purdue University Calumet, US [reference 8] mentions some of them like the result interpretation as he says “It is a difficult aspect because result interpretation is dependent on the skill of the software technician. The greater the skill of the technician, the more effective the data or text mine. Even if a skilled technician is very successful with the data mine, the data mine still may not reach its potential as the user may not have the analytical skills to interpret the results of the text mine”. Another challenge to text mining is that most of its tools use English therefore it’s limited for English speakers only. There are other challenges like the ambiguity of the language like in term of semantic or lexical levels, the context or spelling may affect results. [3] Erin Scroggins, MAY 15, 2008, http://erinscroggins.blogspot.com/2008/05/applicatio ns- of-text-mining.html [4] http://en.wikipedia.org/wiki/Text_mining#Academic _applications [5] http://www.statsoft.com/textbook/text-mining/ [6] Mark Sharp, 11 December 2001, http://comminfo.rutgers.edu/~msharp/text_mining.ht m [7] Anne Kao & Steve Poteet, http://www.sigkdd.org/explorations/issues/7-1-200506/1-Intro.pdf [8] Kuan C. Chen, May, 2009, http://www.cluteinstituteonlinejournals.com/PDFs/1483.pdf [9] http://www.kdnuggets.com/software/text.html VII. Text mining tools Of course you can find a lot of software that use text mining technique. There are two kinds of tools you can use, there is an online text mining such as Ranks, Vivisimo/Clusty and Wordle which has and interesting use of text mining. The other kind is the regular text mining software like Basis Technology, Clarabridge and Compare Suite. All links of the software will be listed in the references [reference 12]. [10] http://en.wikipedia.org/wiki/Biomedical_text_mining [11] Jacobus van Eeden, 09.30.10 http://memeburn.com/2010/09/text-mining-revealsbest-and-worst-words-used-in-social-media/ VIII. Conclusion In conclusion technology always evolves to ease tasks for mankind. Text mining is one of those technologies that assist people with textual data. Text mining helps institutes and individuals organize information into understandable format by saving time, effort and manpower. IX. References [1] Marti Hearst, October 17, 2003, http://people.ischool.berkeley.edu/~hearst/textmining.html [2] Miloš Radovanovic, Mirjana Ivanovic, 2008, http://www.emis.de/journals/NSJOM/Papers/38_3/N SJOM_38_3_227_234.pdf [12] Text mining tools : a. http://ranks.nl/ b. http://search.yippy.com/ c. http://www.wordle.net/ d. http://www.basistech.com/ e. http://www.clarabridge.com/ f. http://comparesuite.com/ [13] Michael W.Berry ,Jacob Kogan.”Text Mining: applications and theory”. (2010).