Uploaded by Sikharin Suwannatee

3july71thomasjouneau-141113052256-conversion-gate02

advertisement
ANALOGIST/EZPAARSE : ANALYSING LOCALLY
GATHERED LOGFILES TO DETERMINE USERS’
ACCESSES TO SUBSCRIBED E-RESOURCES
http://analogist.couperin.org
LIBER
2014 - RIGA - 3/07/2014
http://ezpaarse.couperin.org
Presentation Outline
1- The Context : A Need for Evaluation
2- Gathering Local Data
3- Parsers and Analyses
4- AnalogIST and ezPAARSE
5- Results and Visualization
6- Project Organization
LIBER 2014 - RIGA - 3/07/2014
1
The Context :
A Need for Evaluation
LIBER 2014 - RIGA - 3/07/2014
1. The Context : A need for evaluation
About some well-known facts
$25 billion global revenue in 2012, increasing 4-5 %/year
5.000 to 10.000 publishers / 23.000 e-journals
The Scientific
and Technical
Information
Market
The 4 biggest publishers make half the market
For 10 years the price of most journals increases from 3% to 5% / year
5.500.000 researchers, increasing 3,5% per year
1.5 billion articles downloaded per year and by 10M users
We need to assess and evaluate the use of these e-resources
LIBER 2014 - RIGA - 3/07/2014
1. The Context : A need for evaluation
What we’ve currently got
1st limitation : Vendors are
the only source
… are not available
Publisher
provided
statistics
… are available but
not COUNTERcompliant
… are available and
COUNTER-compliant
→ We need to assess these
numbers
2nd limitation : Only a partial
view, no comparison possible
→ We need to complete
the figures
3d limitation : These numbers
just offer mere quantification
→ We need to qualify them
A possible solution :
→ locally-gathered usage quantification
LIBER 2014 - RIGA - 3/07/2014
2
Gathering Usage
Data Locally
LIBER 2014 - RIGA - 3/07/2014
2. Gathering usage data locally
The reverse proxy
3
4
LIBER 2014 - RIGA - 3/07/2014
2. Gathering usage data locally with a reverse proxy
Where ezPAARSE comes into play
2
1
3
4
LIBER 2014 - RIGA - 3/07/2014
3
Parsers and Analyses
LIBER 2014 - RIGA - 3/07/2014
3. Parsers and analyses
Example of an URL structuration
http://pdn.sciencedirect.com/science?
_ob=MiamiImageURL&_cid=271664&_user=4046427&_pii=S000
1457512000747&_check=y&_origin=browse&_zone=rslt_list_item
&_coverDate=2012-07-31&wchp=dGLbVltzSkWb&md5=f5d8d157ccda6d597cb466af123dbff3/1-s2.0S0001457512000747-main.pdf
LIBER 2014 - RIGA - 3/07/2014
3. Parsers and analyses
Example of an URL structuration
http://pdn.sciencedirect.com/science?
_ob=MiamiImageURL&_cid=271664&_user=4046427&_pii=S000
1457512000747&_check=y&_origin=browse&_zone=rslt_list_item
&_coverDate=2012-07-31&wchp=dGLbVltzSkWb&md5=f5d8d157ccda6d597cb466af123dbff3/1-s2.0S0001457512000747-main.pdf
ISSN & type of the downloaded file
LIBER 2014 - RIGA - 3/07/2014
3. Parsers and analyses
Example of an URL structuration
http://www.sciencedirect.com/science/journal/00014575
ISSN
By manually trying the URL, we find an HTML table of
contents
LIBER 2014 - RIGA - 3/07/2014
3. Parsers and analyses
Example of an URL structuration
http://www.cairn.info/load_pdf.php?ID_ARTICLE=RFG_218_0009
We know it’s a PDF but
we only get a publisherspecific identifier : we
need a correspondance
table : the Publisher
Knowledge Base
(ideally a KBART file)
LIBER 2014 - RIGA - 3/07/2014
3. Parsers and analyses
Parse the URL
http://pdn.sciencedirect.com/science?
_ob=MiamiImageURL&_cid=271664&_user=4046427&_pii=S000145751200074
7&_check=y&_origin=browse&_zone=rslt_list_item&_coverDate=2012-07-31&wc
hp=dGLbVlt-zSkWb&md5=f5d8d157ccda6d597cb466af123dbff3/1-s2.0S0001457512000747-main.pdf
/_pii=S([0-9]{0,7}[0-9X])/i
LIBER 2014 - RIGA - 3/07/2014
3. Parsers and analyses
What do we count?
Serials E-­‐books Law databases Inst. repositories Ar#cles (ARTICLE) Book by #tle (BOOK) Law encyclopedia (ENCYCLOPEDIES) PHD_THESIS Abstract (ABS) Chapter, sec#on (BOOK_SECTION) Law memento (FORMULES) MD_THESIS Table of contents (TOC) Book series (BOOKSERIE) Law manual (BROCHES) MASTER_THESIS Reference (REF) Manuals, handbooks (HANDBOOK) Law codes (CODES) Ar#cle preview (for ex. “Look inside” func#on of SpringerLink) (PREVIEW) Ar#cle in basket/personal folder (BOOKMARK) LIBER 2014 - RIGA - 3/07/2014
- The availability of these items depend on the elements present in
the URL
- The Law databases currently covered are only French ones
3. Parsers and analyses
Platforms covered
Each platform has its own structuration...
...we need one parser for each
LIBER 2014 - RIGA - 3/07/2014
3. Parsers and analyses
Some limitations apply
- Opaque URLs (session ids, encryption…)
- Knowledge bases having to be manually edited
Opaque URLs : session ids, encryption…. Example : the former Springer
platformhttp://www.springerlink.com/content/j5q872410p510m63/
fulltext.pdf
Publisher IDs, needing to be linked to a knowledge base or a reference
file. Example : Cairnhttp://www.cairn.info/load_pdf.php?
ID_ARTICLE=RFG_218_0009
LIBER 2014 - RIGA - 3/07/2014
4
AnalogIST
and ezPAARSE
LIBER 2014 - RIGA - 3/07/2014
4. AnalogIST and ezPAARSE
● ezPAARSE : the software
AnalogIST : the wiki portal
ez : easy / PAARSE : Progiciel d'Analyse
des Accès aux RessourceS Electroniques
Analyse des Logs de l'IST =
Analysing the logs of Scientific and
Technical Information
= Software for Analysing the Accesses to
Online Resources
●
as a local installation
●
as an online service (SaaS)
Free (libre) software
Multi-platform
http://ezpaarse.couperin.org
LIBER 2014 - RIGA - 3/07/2014
→ The place where we gather the
platform analysis, and synchronise
the new parsers with the local
installations
http://analogist.couperin.org
4. AnalogIST and ezPAARSE
local installations
global installation +
collaborative space
Univ 1
Univ 2
...
LIBER 2014 - RIGA - 3/07/2014
AnalogIST
4. AnalogIST and ezPAARSE
Through a web form
With the command line (cURL)
a actualiser nouveau formulaire EN
Use the web form to create the command line suiting your needs.
LIBER 2014 - RIGA - 3/07/2014
5. ezPAARSE : Using the Results
Example of an ezPAARSE output
Deduplicate consultation events :
COUNTER recommendation
KBART fields
LIBER 2014 - RIGA - 3/07/2014
Text file (CSV
format)
geoip fields
5
ezPAARSE :
Using the Results
LIBER 2014 - RIGA - 3/07/2014
5. ezPAARSE : using the results
(Libre/MS) Office rendering macros
LIBER 2014 - RIGA - 3/07/2014
5. ezPAARSE : using the results
Exploiting the Results with
LIBER 2014 - RIGA - 3/07/2014
5. ezPAARSE : using the results
Who (student, researcher, staff) consults what? (UL)
Repartition of consultations of paid content (books, journals, law
references…) by user type at the Université de Lorraine
LIBER 2014 - RIGA - 3/07/2014
5. ezPAARSE : using the results
Consultations by research unit (UL)
Consultations of articles from Jan 2014 to May 2014 by research units at the
Université de Lorraine
LIBER 2014 - RIGA - 3/07/2014
5. ezPAARSE : using the results
Consultations by teaching unit (UL)
Consultations of articles by teaching unit or faculty at the Université de
Lorraine
LIBER 2014 - RIGA - 3/07/2014
5. ezPAARSE : using the results
Geolocalisation of consultations (CNRS)
LIBER 2014 - RIGA - 3/07/2014
5. ezPAARSE : using the results
Detection of an anomaly (CNRS)
The consultation peak corresponds to an abuse of an e-resource.
Detection allows to react promptly to this incident.
LIBER 2014 - RIGA - 3/07/2014
6
Project Organization
LIBER 2014 - RIGA - 3/07/2014
6. Project organization : the method
SCRUM : An agile development method
PRODUCT
VISION
4
C
LIBER 2014 - RIGA - 3/07/2014
6. Project organization : the team
LIBER 2014 - RIGA - 3/07/2014
In conclusion
● ezPAARSE is free and open source
● Simple use and testing
● State of the art technologies
● Feel free to test
● send us log samples
● give us feedback !
LIBER 2014 - RIGA - 3/07/2014
Any Questions?
nuage de tag avec termes appropriés
http://ezpaarse.couperin.org
http://analogist.couperin.org
https://twitter.com/ezpaarse
LIBER 2014 - RIGA - 3/07/2014
http://analogist.couperin.org/platforms/analyse-helper/start
The rest is
automatically
processed
dokuwiki syntax
generated
LIBER 2014 - RIGA - 3/07/2014
More features : exploiting the results with
geolocalization
LIBER 2014 - RIGA - 3/07/2014
Download