ANALOGIST/EZPAARSE : ANALYSING LOCALLY GATHERED LOGFILES TO DETERMINE USERS’ ACCESSES TO SUBSCRIBED E-RESOURCES http://analogist.couperin.org LIBER 2014 - RIGA - 3/07/2014 http://ezpaarse.couperin.org Presentation Outline 1- The Context : A Need for Evaluation 2- Gathering Local Data 3- Parsers and Analyses 4- AnalogIST and ezPAARSE 5- Results and Visualization 6- Project Organization LIBER 2014 - RIGA - 3/07/2014 1 The Context : A Need for Evaluation LIBER 2014 - RIGA - 3/07/2014 1. The Context : A need for evaluation About some well-known facts $25 billion global revenue in 2012, increasing 4-5 %/year 5.000 to 10.000 publishers / 23.000 e-journals The Scientific and Technical Information Market The 4 biggest publishers make half the market For 10 years the price of most journals increases from 3% to 5% / year 5.500.000 researchers, increasing 3,5% per year 1.5 billion articles downloaded per year and by 10M users We need to assess and evaluate the use of these e-resources LIBER 2014 - RIGA - 3/07/2014 1. The Context : A need for evaluation What we’ve currently got 1st limitation : Vendors are the only source … are not available Publisher provided statistics … are available but not COUNTERcompliant … are available and COUNTER-compliant → We need to assess these numbers 2nd limitation : Only a partial view, no comparison possible → We need to complete the figures 3d limitation : These numbers just offer mere quantification → We need to qualify them A possible solution : → locally-gathered usage quantification LIBER 2014 - RIGA - 3/07/2014 2 Gathering Usage Data Locally LIBER 2014 - RIGA - 3/07/2014 2. Gathering usage data locally The reverse proxy 3 4 LIBER 2014 - RIGA - 3/07/2014 2. Gathering usage data locally with a reverse proxy Where ezPAARSE comes into play 2 1 3 4 LIBER 2014 - RIGA - 3/07/2014 3 Parsers and Analyses LIBER 2014 - RIGA - 3/07/2014 3. Parsers and analyses Example of an URL structuration http://pdn.sciencedirect.com/science? _ob=MiamiImageURL&_cid=271664&_user=4046427&_pii=S000 1457512000747&_check=y&_origin=browse&_zone=rslt_list_item &_coverDate=2012-07-31&wchp=dGLbVltzSkWb&md5=f5d8d157ccda6d597cb466af123dbff3/1-s2.0S0001457512000747-main.pdf LIBER 2014 - RIGA - 3/07/2014 3. Parsers and analyses Example of an URL structuration http://pdn.sciencedirect.com/science? _ob=MiamiImageURL&_cid=271664&_user=4046427&_pii=S000 1457512000747&_check=y&_origin=browse&_zone=rslt_list_item &_coverDate=2012-07-31&wchp=dGLbVltzSkWb&md5=f5d8d157ccda6d597cb466af123dbff3/1-s2.0S0001457512000747-main.pdf ISSN & type of the downloaded file LIBER 2014 - RIGA - 3/07/2014 3. Parsers and analyses Example of an URL structuration http://www.sciencedirect.com/science/journal/00014575 ISSN By manually trying the URL, we find an HTML table of contents LIBER 2014 - RIGA - 3/07/2014 3. Parsers and analyses Example of an URL structuration http://www.cairn.info/load_pdf.php?ID_ARTICLE=RFG_218_0009 We know it’s a PDF but we only get a publisherspecific identifier : we need a correspondance table : the Publisher Knowledge Base (ideally a KBART file) LIBER 2014 - RIGA - 3/07/2014 3. Parsers and analyses Parse the URL http://pdn.sciencedirect.com/science? _ob=MiamiImageURL&_cid=271664&_user=4046427&_pii=S000145751200074 7&_check=y&_origin=browse&_zone=rslt_list_item&_coverDate=2012-07-31&wc hp=dGLbVlt-zSkWb&md5=f5d8d157ccda6d597cb466af123dbff3/1-s2.0S0001457512000747-main.pdf /_pii=S([0-9]{0,7}[0-9X])/i LIBER 2014 - RIGA - 3/07/2014 3. Parsers and analyses What do we count? Serials E-­‐books Law databases Inst. repositories Ar#cles (ARTICLE) Book by #tle (BOOK) Law encyclopedia (ENCYCLOPEDIES) PHD_THESIS Abstract (ABS) Chapter, sec#on (BOOK_SECTION) Law memento (FORMULES) MD_THESIS Table of contents (TOC) Book series (BOOKSERIE) Law manual (BROCHES) MASTER_THESIS Reference (REF) Manuals, handbooks (HANDBOOK) Law codes (CODES) Ar#cle preview (for ex. “Look inside” func#on of SpringerLink) (PREVIEW) Ar#cle in basket/personal folder (BOOKMARK) LIBER 2014 - RIGA - 3/07/2014 - The availability of these items depend on the elements present in the URL - The Law databases currently covered are only French ones 3. Parsers and analyses Platforms covered Each platform has its own structuration... ...we need one parser for each LIBER 2014 - RIGA - 3/07/2014 3. Parsers and analyses Some limitations apply - Opaque URLs (session ids, encryption…) - Knowledge bases having to be manually edited Opaque URLs : session ids, encryption…. Example : the former Springer platformhttp://www.springerlink.com/content/j5q872410p510m63/ fulltext.pdf Publisher IDs, needing to be linked to a knowledge base or a reference file. Example : Cairnhttp://www.cairn.info/load_pdf.php? ID_ARTICLE=RFG_218_0009 LIBER 2014 - RIGA - 3/07/2014 4 AnalogIST and ezPAARSE LIBER 2014 - RIGA - 3/07/2014 4. AnalogIST and ezPAARSE ● ezPAARSE : the software AnalogIST : the wiki portal ez : easy / PAARSE : Progiciel d'Analyse des Accès aux RessourceS Electroniques Analyse des Logs de l'IST = Analysing the logs of Scientific and Technical Information = Software for Analysing the Accesses to Online Resources ● as a local installation ● as an online service (SaaS) Free (libre) software Multi-platform http://ezpaarse.couperin.org LIBER 2014 - RIGA - 3/07/2014 → The place where we gather the platform analysis, and synchronise the new parsers with the local installations http://analogist.couperin.org 4. AnalogIST and ezPAARSE local installations global installation + collaborative space Univ 1 Univ 2 ... LIBER 2014 - RIGA - 3/07/2014 AnalogIST 4. AnalogIST and ezPAARSE Through a web form With the command line (cURL) a actualiser nouveau formulaire EN Use the web form to create the command line suiting your needs. LIBER 2014 - RIGA - 3/07/2014 5. ezPAARSE : Using the Results Example of an ezPAARSE output Deduplicate consultation events : COUNTER recommendation KBART fields LIBER 2014 - RIGA - 3/07/2014 Text file (CSV format) geoip fields 5 ezPAARSE : Using the Results LIBER 2014 - RIGA - 3/07/2014 5. ezPAARSE : using the results (Libre/MS) Office rendering macros LIBER 2014 - RIGA - 3/07/2014 5. ezPAARSE : using the results Exploiting the Results with LIBER 2014 - RIGA - 3/07/2014 5. ezPAARSE : using the results Who (student, researcher, staff) consults what? (UL) Repartition of consultations of paid content (books, journals, law references…) by user type at the Université de Lorraine LIBER 2014 - RIGA - 3/07/2014 5. ezPAARSE : using the results Consultations by research unit (UL) Consultations of articles from Jan 2014 to May 2014 by research units at the Université de Lorraine LIBER 2014 - RIGA - 3/07/2014 5. ezPAARSE : using the results Consultations by teaching unit (UL) Consultations of articles by teaching unit or faculty at the Université de Lorraine LIBER 2014 - RIGA - 3/07/2014 5. ezPAARSE : using the results Geolocalisation of consultations (CNRS) LIBER 2014 - RIGA - 3/07/2014 5. ezPAARSE : using the results Detection of an anomaly (CNRS) The consultation peak corresponds to an abuse of an e-resource. Detection allows to react promptly to this incident. LIBER 2014 - RIGA - 3/07/2014 6 Project Organization LIBER 2014 - RIGA - 3/07/2014 6. Project organization : the method SCRUM : An agile development method PRODUCT VISION 4 C LIBER 2014 - RIGA - 3/07/2014 6. Project organization : the team LIBER 2014 - RIGA - 3/07/2014 In conclusion ● ezPAARSE is free and open source ● Simple use and testing ● State of the art technologies ● Feel free to test ● send us log samples ● give us feedback ! LIBER 2014 - RIGA - 3/07/2014 Any Questions? nuage de tag avec termes appropriés http://ezpaarse.couperin.org http://analogist.couperin.org https://twitter.com/ezpaarse LIBER 2014 - RIGA - 3/07/2014 http://analogist.couperin.org/platforms/analyse-helper/start The rest is automatically processed dokuwiki syntax generated LIBER 2014 - RIGA - 3/07/2014 More features : exploiting the results with geolocalization LIBER 2014 - RIGA - 3/07/2014