International Journal of Engineering Trends and Technology (IJETT) – Volume 19 Number 1 – Jan 2015 An Efficient String Transformation Technique with for user Input Query Javvaji Vamshi Krishna 1, P.Prabhakar2, C.P.Y.N.J.Mohanarao3 1 2 3 Final M.Tech Student , Assistant Professor , Principal 1 2 Department of Information Technology, Department of CSE Avanthi Institute of Engineering and Technology, Makavarapalem, Visakhapatnam. Abstract:Probabilistic String Transformation is always an important research topic in the field of natural language processing and search engine optimization, Even though various traditional approaches available for string transformation or candidate set generation , they are not efficient approaches because of its parameters , Accuracy and efficiency are the basic parameters to consider them to optimize during the generation of candidate sets or output Strings, Prior to that we consider the set of likely keywords which meets the minimum levestein distance. In this paper we are proposing an efficient Model of String transformation for Candidateset generation and Query Reformulation, we are proposing an evolutionary approach for the correction of the misspelled wordswith edit distance and index based comparison either by single character or substring. I. INTRODUCTION In un-formal or formal languages word transformation, correction and term generation is systematically formatted as string transformation. It is also used in query generation and re-generation. In many applications it is deployed such as online applications, transformation techniques and work accurately and efficiently. String Transformation explains as follows. Consider a string and it performs set of operations and it generates most likely strings by applying some operators. A string can be set of words or characters or various tokens. In every operator the transformation instructions are defined the replacement of the part of string with another string. It is classified into two different settings and it depends on dictionary or not dictionary. If the dictionary is used the output strings must present in given dictionary. These applications are mainly used in web applications. In many applications initial task is tokenizing and the second task is tokenizing the terms in string. ISSN: 2231-5381 Coming spelling check the query is usually consists of two different operations such as candidate generation and candidate selection. Both are used to find the similar words which are matched to the misspelled word given by the user. The operators represent the manipulations methods such as insertion, deletion, and updating of the neighbor characters. It is an example for string transformation. Past take a shot at string transformation can be arranged into two gatherings. Some work mostly considered proficient era of strings expecting that the model is given in many researches. Other work attempted to take in the model with distinctive methodologies and consider an example a generative model a logistic relapse model presented in previous works and a discriminative model. Be that as it may productivity is not a critical element thought seriously about in these strategies. Conversely, our work in this paper intends to take in a model for string transformation which can attain to both high precision and effectiveness. There are three major issues with string transformation: (1) How to characterize a model which can attain to both high precision also effectiveness. (2) How to precisely and productively prepare the model from preparing occurrences. (3) How to effectively create the top k yield strings given the info string with or without utilizing a dictionary. String transformation has numerous applications in information mining common dialect handling and the data recovery what's more bioinformatics. String transformation has been contemplated in distinctive particular undertakings, for example, database record matching, spelling mistake rectification, question reformulation what's more equivalent word mining. The http://www.ijettjournal.org Page 8 International Journal of Engineering Trends and Technology (IJETT) – Volume 19 Number 1 – Jan 2015 significant distinction between our work and the current work is that we center on improvement of both precision and proficiency of string transformation. Even though various traditional approaches available for candidate set generation, they are not optimal, because of lack of index based comparison, occurrence based comparison does not give accurate results because they don‟t compare according to index and it does not work fora small keyword due to edit operations, So we need an efficient query reformulation. II. RELATED WORK Even though various approaches available for string transformation they are not optimal in terms of accuracy, traditional approaches like levenshtein distance measure is a time complexity issue because performs the distance with all the keywords in the dictionary and in the log liner model it compares the occurrences of the characters in the source string and target string but not the indexes of the characters In this paper a new approach is proposed for the change of the string is obtained with high efficiency and huge accuracy and also perfect of the huge amount of the data as a string format. To identify the approach input string and output string are combined and provided in huge amount as a sample data to test the approach along with the sample data set operators set are make available for the change of string. Now a standard probabilistic is obtained from the combined input string and output string as a sample dataset which gets the count of the applicants from the output string to the input string.The finest applicant is defined as the highest probabilistic count with respect to the sample dataset. In this process there are two process one is studying and the other is creation in the process of studying set of rules are defined and get obtained from the sample dataset and the standard change of the string format is build based on the rules and bulk of the dataset proposed by the studying process and in the creation process is to provide a new dataset among the total number of applicants certain number of the n applicants are obtained from the sample dataset as a standard rules set in the studying process. Finally in this process change of string format represents the rules and weights as a un-deviated model studying provides extreme likelihood approximation on the sample dataset and creation is efficiently conducted. Without damage of sample dataset rule based huge data to the studying process is predefined as a consequence the outcome of the total number of the ISSN: 2231-5381 outcome to change the string format as a group is also restricted this is to get the difference between input and output must not be in large number III. PROPOSED WORK In this paper we are proposing an efficient string transformation technique with identification of possible correct keywords and extraction of the likely keywords from the dictionary then finds the edit distance over likely keywords. Query reformulation feature enhanced by providing elimination word bag and for optimal results we proposed index based comparison for retrieve the top candidate sets for input query. Accuracy can be calculated in terms of correct number of candidate set generations for a user query or keyword with previous approach and proposed approachby using graphical user representation.Number of candidate sets can be generated for the user query, Initially making the corrections in misspelled user query or keyword by the evolutionary approach, In this approach random characters or substring can be substituted which are available in dictionary dataset Index based comparison In the traditional approach source string and target string can be compared with occurrence of the character not with index based occurrence,so in this project we are comparing every character in source string and target string should be identical with respect to index and initial priority given to highest order of dictionary target string Evolutionary Approach Input: Input Source string „S‟ Dict_words D (w1,w2………….wn) Likely_set (l1,l2………..ln) Output: Candidateset_ ListC (c1,c2…………cn) Step1: Find likely keywords for source string and compute edit distance http://www.ijettjournal.org Page 9 International Journal of Engineering Trends and Technology (IJETT) – Volume 19 Number 1 – Jan 2015 Step2: Get minimum number of edit operations with respect to likely strings (l1,l2………..ln)and input string Query Reformulation In Query Reformulation, We can generate the output strings for the input String “IEEE” as Institute of Electrical and Electronics Engineers by maintain the semantics of the respective keywords to generate the optimal set of keywords,after the generation of the keywords semantics of the respective keywords also integrated to existing the output strings. Step3: if (min_editdistance<= Threshold value) Add lito C. Step4: Compute Query _Reformulation (S,W i) Step5: for i=0 ; i<D.length ; i++ In traditional query reformulation technique it simply tokenize the string and compares with respective keywords but fails with additional set of words like articles, prepositions etc. ,to resolve this issue we are maintain word bag to eliminate the unnecessary keywords Counter=0; If String_diff (S,Wi) < = threshold Add Wi to C Next Query Reformulation Step 6: Store candidate sets for „S‟. Void Query _Reformulation (S,Wi) The following pseudo code shows index based comparison and maintains the order with respect to source and target string { Index Based Comparison : Count=0; String_Diff(S,W) forint i=0;i<s.length;i++ { if T[i] not available in Eliminate words Then Compare counter :=0 if S[i]== T[i] then T : =Wi.gettokens() Count :=+1; For i=0; i<S.length;i++ End if If S[i]== W[j] Then Compare_counter:+1 End if Next Next If Comparecounter> Threshold value then If count==T.count then Add order wise Wi to C Add Wi to C. } } To generate more accurate and efficient candidate sets, we are performing index based comparison between source string and target string, it compares individual character along with their index, if both are equal it will set to „1‟,continues this process until it reaches source string maximum size Accuracy Computation ISSN: 2231-5381 Accuracy can be calculated in terms of correct number of candidate set generations for a user query or keyword with previous approach and proposed approach by using graphical user representation. http://www.ijettjournal.org Page 10 International Journal of Engineering Trends and Technology (IJETT) – Volume 19 Number 1 – Jan 2015 IV. CONCLUSION We have been concluding our current research work with efficientcandidate set generations by generating accurate and more number of candidate sets for input query through index based comparison in our evolutionary approach. Query reformulation can be efficiently handled with string tokenization and word bag to maintain the elimination keywords, our experimental results shows optimal results than traditional approaches string transformation approaches.We can enhance the current research work of evolutionary candidate set generation with cache implementation, to access the frequently accessed information that obviously reduces the space and time complexity issues. We can filter the results based on the ranking of possible matched keywords. REFERENCES [1]. “Learning a spelling error modelfrom search query logs” by F. Ahmad and G. Kondrak. In Proceedings of EMNLP 2005,pages 955–962, 2005. [2] “Agglomerative clustering of asearch engine query log. In Knowledge Discovery and DataMining” by D. Beeferman and A. Berger., pages 407–416, 2000. [3] S. Bergsma and Q. I. Wang.Learning noun phrase querysegmentation. In Proceedings of EMNLP-CoNLL 2007,pages 819–826, 2007. [9] “Cumulated gain-basedevaluation of ir techniques” by K. Jarvelin and J. Kekalainen. . ACM Trans. Inf. Syst.,20(4):422–446, 2002. BIOGRAPHIES JavvajiVamshi Krishna pursuing m.tech in avanthiinstitue of engineering and college tamaram(vill), makavarapalem(md),vishapatnam(dist). His interested areas are data mining, network security, and cloud computing. P.Prabhakar working as assistant professor with 5 years‟ experience in avanthiinstitueof engineeringandtechnology.tamaram (vill), makavarapalem (md), visakhapatnam (dist). His interested areas are data mining, network security, and cloud computing. Dr.C.P.Y.N.J.Mohanarao completedm.tech, and ph.d.he is working as principal in avanthiinstitue of engineering and technologytamaram (vill), makavarapalem (md), visakhapatnam (dist). His interested areas are data mining, network security, and cloud computing. [4].A Unified and Discriminative Model for Query RefinementJiafengGuo. [5] “Top K Pruning Approach to String Transformation for candidate set generations “ by A. Meenahkumary [6] S. Cucerzan and E. Brill. Spelling correction as an iterativeprocess that exploits the collective knowledge of web users.In Proceedings of EMNLP 2004, pages 293– 300, 2004. [7] A. Feuer, S. Savev, and J. A. Aslam.Evaluation of phrasalquery suggestions.In Proc. of CIKM ‟07, November, 2007. [8] W. Frakes and R. Baeza-Yates. Information Retrieval:Data Structures & Algorithms. Prentice Hall, EnglwoodCliffs, New Jersey, 1992. ISSN: 2231-5381 http://www.ijettjournal.org Page 11