Overview Clustering Text analysis Results Profiling Profile parts Intelligent maintenance and use WP Intelligent maintenance and use will provide Information on: content Information on: activities Information on: status Intelligent management of content Analyze the characteristics and explore algorithms Support the activities of different types of actors Develop effective tools for Intelligent use and maintenance Advanced user management Enhance user experience Provide personalized support for the use of sciX services Will enable customization to the reasonable level Help users (reduce time) to retrieve, exchange, publish, etc. Information Provide basis for easier scientific collaboration Results Overview Clustering Text analysis Results Profiling Profile parts Use of clustering Assist different types of users Allow users to browse through the papers by the topics Help editors to categorize scientific contributions Develop topic maps covered in different repositories Canonical form Detect most appropriate canonical forms similar documents in the field of scientific technical papers Efficiency of clustering depends on selected algorithms Will perform local as well as global analysis Results Overview Clustering Text analysis Results Profiling Profile parts Results Characteristics of the repositories 2500 Efficiency of text mining algorithms strongly depends on characteristics of the text 1500 1000 500 enlarg interconnec scan sculptur plant notat risk habit length conveny fourth elimin live rapidly competit applicabl typic full-scale directly environmen altern artificy investig design 0 univers Number of occourancies 2000 Distribution of sorted wordstem frequencies Time dependent word-stem frequencies – do not effect growth of vocabulary , but are very important for the overview of the developments in certain scientific field Overview Clustering Text analysis Results Profiling Profile parts Results More results from the analysis 100% process model 90% build process product model 80% product data data model 70% data exchang 60% design construct design process Analysis based on multi word frequencies 50% comput aid object orient 40% inform system 30% knowledg base project manag 20% construct project 10% Crossbow Crossbow Cluster 5 Crossbow Cluster 3 Crossbow Cluster 7 Crossbow Cluster 4 Crossbow Cluster 2 Crossbow Cluster 6 Crossbow Cluster 1 life cycl Crossbow Cluster 8 construct process 0% y1988 Clustifier Clustifier Cluster 1 Clustifier Cluster 8 Clustifier Cluster 3 Clustifier Cluster 6 y1991 Clustifier Cluster 4 Clustifier Cluster 2 Clustifier Cluster 5 y1992 Clustifier Cluster 7 y1993 y1994 y1995 y1996 y1998 y1999 y2000 y2001 y2002 From the Analysis of word frequencies evolutions of research topics can be determined Evaluation of different clustering algorithms Overview Clustering Text analysis Text Results mining Profiling Profile parts Text Mining Tools Custom made analysis tools Open Source Software Bow: A Toolkit for Statistical Language Modeling, Text Retrieval, Classification and Clustering Clusstifier (K-mean, semantic analysis) Commercial: Text Analyst Intelligent miner for text Results Overview Clustering Text analysis Results Profiling Profile parts Use of results of clustering Results Overview Clustering Text analysis Results Profiling Profile parts SCIX profiling Profiling shouldn’t Profiling should Limit general use of the sciX services Ease the use of the sciX services Interfere with privacy issues Help users (reduce time) to retrieve, exchange, publish, etc. Information Force the user to customize each and every page Do things without users’ knowledge Provide basis for easier scientific collaboration Enhance the overall user experience Results Overview Clustering Text analysis Results Profiling Profile parts User profile Personal Profile User profile will provide personalized support for the use of sciX services personal data category gathering data category document content category document structure category document source category delivering data category delivery means category delivery time category actions data category security data category Intelligent maintenance and use will be based on advanced usage tracking as text analysis Will provide different modalities – push technologies Customization to the extend that will help information browsing and monitoring Results Overview Clustering Text analysis Results Profiling Profile parts Personal data about users Personal Profile personal data category gathering data category document content category document structure category document source category delivering data category delivery means category Data model username Given Name Family Name Email Title Affiliation Country Address Security services for scientific publishing not comparable to industry standards delivery time category actions data category security data category Will provide a secure way of exchange of personal information Results Overview Clustering Text analysis Results Profiling Profile parts Results Actions: advanced usage tracking Welcome Pages ind ex.htm 82 2 247 304 Browse Pages 171 S earc h Forms SearchF orm BrowseAZ 13 9 344 1405 12 3 144 Go 6937 1321 797 248 BasketShow 1464 344 449 15 0 223 90 9 Bask etAdd Sho w 335 287 2547 31 2 188 43 7 199 Record Details 252 24 94 Search abou t.htm B rowseKeyw ords 779 Search Resul ts 18 0 43 0 212 12 4 AdvancedSearchForm 137 B asket man ipulat ion 855 1089 134 Session 1 2 3 4 5 6 7 19 394 25 83 Lo ginForm Login 148 Paper 1 0 1 0 1 0 1 1 Paper 2 1 0 1 1 0 0 0 Paper 3 1 0 0 0 1 0 0 Paper Paper 1 Paper 2 Paper 3 Paper 4 Paper 5 Sum Paper 4 0 1 1 0 0 1 0 Paper 5 0 1 0 1 0 1 0 Session 2 4 6 Sum Paper 1 1 1 1 3 Paper 2 0 1 0 Paper 3 0 0 0 Paper 4 1 0 1 Paper 5 1 1 1 1 0 2 3 78 53 394 379 38 9 406 BasketAd dOne 22 1 27 4 368 Misc Pages Session Paper 1 Paper 2 ... Paper i ... Paper n-1 Paper n 1 0 1 1 0 1 2 1 1 0 0 0 ... J 0 0 1 1 0 ... s-1 1 1 0 1 0 S 0 0 1 0 0 Del ete Dis playS tructu re Reg3 228 3 1 0 2 3 Paper Paper 5 Paper 4 Paper 2 Sum 3 2 1