NeoRef

advertisement
NeoRef
the knowledge management
system of the future
Bradley Hemminger
School of Information and Library Science,
University of North Carolina, Chapel Hill
http://ils.unc.edu/bioinfo
1/8/2003
© Bradley Hemminger UNC-CH
What if….
• Publishing material was a simply as
formatting it in a standard format like PDF
or JPEG and clicking a button to submit it
to an archive and/or journal? Your article
gets to keep all its links to color pictures
and graphs, even dynamic graphs or videos.
Just like you originally produced them.
• Power tools for publishing
1/8/2003
© Bradley Hemminger UNC-CH
What if…
• You could name the subjects you were interested
in, and you would instantly be given a list of all
articles on that topic, and anytime a new one was
published, you would receive a link to it? It
would automatically be added to your Reference
Database so you could add a citation with a click.
• Universal archive (OAI), controlled vocabulary
indexing and retrieval
1/8/2003
© Bradley Hemminger UNC-CH
What if
• You could search for any arguments in any
literature, public comment, review, or database
that related to your new research proposal or
paper?
• controlled vocabulary indexing and retrieval, full
support of Dublin Core and qualifiers.
• Examples:
– What genes are linked with causing schizophrenia?
– What articles disagree with the claims in my research
proposal?
1/8/2003
© Bradley Hemminger UNC-CH
What if
• You are able to filter the 1000 articles you
received from the search “breast cancer” and
“smoking” in PubMed so that you got only the 31
articles specifically referring to clinical studies
establishing whether smoking was a causal factor
of breast cancer?
• controlled vocabulary indexing and retrieval, full
support of Dublin Core and qualifiers, extensions
to domain specific, and “concepts/claims”.
1/8/2003
© Bradley Hemminger UNC-CH
Design for Open Archives
MIT Dspace
Cornell ArXiv
USER
Stanford Digital Library
OAI
harvester
Duke
UNC
contributor
1/8/2003
© Bradley Hemminger UNC-CH
University of
Washington
NC State
Methodology
• DCI = Digital Content Items
• DOI = Digital Object Identifiers
• OAI = Open Archive Initiative
1/8/2003
© Bradley Hemminger UNC-CH
NeoRef Methodology
• Anyone can submit a DCI
• All submitted DCIs, regardless of type (journal
article or Joe Bob’s comments), receive DOIs and
are stored on one or more OAI archives.
• Authors provide initial metadata with submission.
• Articles reside on one or more physical archives.
All archives together operating under OAI, form
one logical universal archive that is harvestable
and searchable by one interface.
1/8/2003
© Bradley Hemminger UNC-CH
NeoRef Methodology cont’d
• Standardize domain extensions in science and
medicine via extending Dublin Core Subject
encoding schemes to include GO (Gene
Ontologies), etc.
• Extend types in Dublin core metadata to finer
granularity, specifically “concepts” and “claims”.
• Add more structure to what’s indexed—instead of
narrative descriptions only (journal articles) allow
more structured, logic-based statements.
1/8/2003
© Bradley Hemminger UNC-CH
DCI Example Types
–
–
–
–
–
–
–
–
–
1/8/2003
Journal articles
Books
Research notes
Genetic Sequence data
Concepts
Abstracts
Indexing
Reviews
Claims
© Bradley Hemminger UNC-CH
Methodology, Metadata
• Metadata (Dublin Core) is required for DCI
Submissions.
•
•
•
•
•
•
Date ISO YYYY-MM-DD
Format: CV Internet media type, or extension of.
Resource ID: URI (DOI, URL, ISBN)
Language: ISO 639 (Internet RFC 1766: en, fr, ..)
Creator: Name
Publisher: Name
1/8/2003
© Bradley Hemminger UNC-CH
Methodology, Metadata
• Contributor: Name
• Rights Management: CV for intellectual property
rights, or IP text
• Title: Text
• Subject and Keywords: author provided CV from
LCSH, DDC, UDC, LCC, MeSH, GO, etc, and/or
free text.
• Description: author provided free text abstract
1/8/2003
© Bradley Hemminger UNC-CH
Methodology, Metadata
• Resource Type: CV from DCMI Type Vocabulary
or similar (we may need to extend).
• Source CV reference to another resource this is
based on (URI).
• Coverage: CV from Thesaurus of Geographic
Names, or similar. (spatial or temporal mainly)
• Relation: CV from Dublin Core, or CV from
NeoRef extensions (ScholOnto, MeSH, GO, etc).
1/8/2003
© Bradley Hemminger UNC-CH
Methodology cont’d
• Reviews, abstracts, indexing information
can be stored as their own items (related to
item they reference).
• Claims (concept A relates to concept B)
would be stored as Concept items.
Concepts give finer granularity than a paper,
and support more structured logic than
simple keyword searching.
1/8/2003
© Bradley Hemminger UNC-CH
Representing Claims
• DOI #1 Concept “geneX” in article U
• DOI #2 Concept “lung cancer” in article U
• DOI #3 Claim U: DOI #1 (concept geneX) has
relation “causes” to DOI #2 (“lung cancer”).
• Claim U has relationship is inconsistent with
claim V.
• Claims are statements about concepts in an item,
and how they relate to other items.
1/8/2003
© Bradley Hemminger UNC-CH
Representing Claims via
Concepts
• DOI #1 Concept “lung cancer” in article U
• DOI #2 Concept “geneX” in article U has relation
“causes” to DOI #1 (concept lung cancer).
• PROBLEM: can’t reference claims directly, i.e.
only indirectly via concepts. For example how do
you indicate that DOI #4 (Claim B) is inconsistent
with DOI #2 (claim A)?
1/8/2003
© Bradley Hemminger UNC-CH
Example Retrievals
• Retrieve all articles on Smad4 published in
any refereed journal, or any article reviewed
by someone in my Respected_Reviewer list.
• Retrieve any article with (index term Fish
Oil OR concept Fish Oil) having any
relationship with (index term Raynaud’s
disease OR concept Raynaud’s disease).
1/8/2003
© Bradley Hemminger UNC-CH
What Changes with the NeoRef
Model?
• Anyone can submit
• Anyone can review/comment/index
• Anyone can retrieve any item in the universal
archive, based on Dublin Core metadata.
• Reviews, ratings, journal acceptance, citations,
hits, become measures of quality. The scale is not
binary (accepted in journal) but more continuous.
• Significantly improve ability to track arguments
about concepts throughout the literature.
1/8/2003
© Bradley Hemminger UNC-CH
Where does the Work go?
• Submission Work is pushed onto the self
contributing author to describe and index
their material properly.
• Search&Retrieval Work is pushed to the
retrieval side where you must provide
powerful filtering and good user interfaces
so that the searcher is not overwhelmed.
1/8/2003
© Bradley Hemminger UNC-CH
Tools/Services Needed (NeoRef)
• Automatic metadata extraction (authors,
date, title, keywords) to save the author
from manually repeating this. In the future
Word style sheets or XML entry may make
automatic.
• Support for putting your materials on the
open archive. (Librarian).
1/8/2003
© Bradley Hemminger UNC-CH
Tools/Services Needed (NeoRef)
• Choice of classification schema to code
keywords in, and easy selection and
addition of keywords (I.e. MeSH tree).
• Support for putting your materials on the
open archive. (Librarian).
• PubMed type interface to search OAI
archives metadata.
• Google to search full text of articles?
1/8/2003
© Bradley Hemminger UNC-CH
Part 2--uncOpenArchive
•
•
•
•
•
Open Archives
Digital Libraries
Publishers (will they disappear?)
Copyright
uncOpenArchive
1/8/2003
© Bradley Hemminger UNC-CH
OAI helps facilitate new
Publishing Models
• Now that all the parts of the publication
process are digital, the independent parts
can be separated. Separable are
– Classification (Ed Staff: appropriateness)
– Review (Scholars: quality rating, acceptance
judgment, feedback to author)
– Copy Editing (publishing staff)
– Printing or Rendering into permanent form
(publishing staff).
1/8/2003
© Bradley Hemminger UNC-CH
Status Quo
Creator
(Academic)
Consumer
(Academic)
Reviewer
(Academic)
Purchaser
Representative
(Library)
Publisher
(Commercial)
archive
1/8/2003
© Bradley Hemminger UNC-CH
Modest Proposal (NeoRef)
Consumer
Creator
(Academic)
(Academic) Reviewer
(Academic)
University
Library
Professional
Society
1/8/2003
archive
© Bradley Hemminger UNC-CH
For-Profit Licensing Model
• Publisher: Commercial company
• Cost: $0 to $4000 for full review and copy edit,
plus operation costs, and profit.
• Cost paid for by purchasing library.
• Copyright: Author transfers copyright of final
(valued added) version to journal. Publisher
negotiates licensing with libraries to recoup cost.
Publisher requires that author give up rights to
final version, and may require that preliminary
versions not be available (e.g. Chemical
Abstracts).
1/8/2003
© Bradley Hemminger UNC-CH
Non-Profit Licensing Model
• Publisher: individual, professional society,
institution, government.
• Cost: $0 to $4000 for full review and copy edit
• Cost paid for by purchasing library, possibly with
cost offset by publisher or author as in Free model.
• Copyright: Author transfers copyright of final
version to journal so that they can license to
libraries, but retains rights to preliminary version
and possibly final version, which may be put on
web for free access.
1/8/2003
© Bradley Hemminger UNC-CH
Free Model
• Publisher: individual, professional society,
institution, government.
• Cost: $0 to $4000 for full review and copy edit
• Cost paid for by
– Subsidized by institution (e.g. University library,
Genbank by NCBI)
– Subsidized by professional society
– Paid by author (e.g. $250 for MRS Internet Journal of
Nitride Semiconductor Research, $500 BioMedCentral)
• Copyright: fully maintained by author
1/8/2003
© Bradley Hemminger UNC-CH
Other costs
• Submissions: essentially $0 cost to prepare final
reasonably high quality PDF with available tools.
• Review: (covered on previous pages).
• Archive: paid by some combination of Review
participants: professional society, publisher,
institution (university), government (NLM, NSF)
• Retrieval/Searching: Either the archive, or OAI
harvesters (free eg. CiteBase, Arc, or commercial).
• For profit models generally control all these
services, while other models allow separate
entities to provide archiving, or retrieval services.
1/8/2003
© Bradley Hemminger UNC-CH
Opportunities
• Review of digital objects (free and
professional)
• Indexing of digital objects (author, free,
professional)
• Archiving of digital objects (universities,
commercial)
• Search and retrieval of digital objects (free
harvesters, commercial tools).
1/8/2003
© Bradley Hemminger UNC-CH
Related work
•
•
•
•
•
•
•
Digital archives
Archive standards
Harvesting and searching
Digital library software
Publisher policies
E-journals
Peer Review
1/8/2003
© Bradley Hemminger UNC-CH
Digital Library/Archives
• arXiv: digital Print archive, example of
academic community.
• MIT’s Dspace, excellent example of
university support (with industry help)
• Arizona’s DLIST (Information Science and
Technology digital archive).
1/8/2003
© Bradley Hemminger UNC-CH
Digital Publisher/Libraries
• Public Library of Science (editorial board, peer
review, etc in house; free access, $1500(?) author
submission cost).
• BioMed Central (individual e-journals participate
as part of this, utilizing their infrastructure; free
access; $500 author submission cost; reviews,
images, other additional materials cost extra).
• Stanford’s Highwire
1/8/2003
© Bradley Hemminger UNC-CH
Archive standards
• Open Archive Initiative (OAI) standards
for open federated archives and metadata
harvesting.Current registered OAI archives
• Dublin Core: standard minimal set of
metadata common across domains. Dublin
Core Library Profile.
1/8/2003
© Bradley Hemminger UNC-CH
Harvesting & Search
• Cite-base
• Arc
• Open Journal Systems (example)
1/8/2003
© Bradley Hemminger UNC-CH
Digital Library Commercial
Software
•
•
•
•
Endeavor Systems (ENCompass)
Ex Libris (DigiTool)
Sirsi (Hyperion)
Artesia (TEAMS)
1/8/2003
© Bradley Hemminger UNC-CH
Publisher Policies
• Listing of what publishers allow regarding
copyright and submission of articles to eprint servers. Publisher Survey
1/8/2003
© Bradley Hemminger UNC-CH
E-only Journals
• Survey of E-only journals, discussion of tradeoffs
of E-only journals. Llewellyn 2002
–
–
–
–
–
Subject: 100+ journals, 10+ subjects
85% are free access
Indexing a problem (33% not indexed)
cataloging: (3% no OCLD holdings)
Few citations (probably primarily because not indexed
or cataloged).
• Peer-reviewed E-only journals listing
1/8/2003
© Bradley Hemminger UNC-CH
Peer Review
• Faculty of 1000: (prior pages on
publication).
• ScholOnto: paid by some combination of
Review participants (professional society,
non-profit, for profit)
• dEbates in Science Magazine
1/8/2003
© Bradley Hemminger UNC-CH
Where to Next? (NeoRef)
• NeoRef (bioivlab prototype)
– Make possible the inclusion of reviews and claims with
DC metadata.
– Extend existing openarchives efforts to include
metadata (keywords, indexing, concepts) from
bioinformatics domain.
• Develop convenient and accurate author deposit of
materials and metadata, and searching and
retrieval.
• Create Information and Library Science Research
in Bioinformatics E-journal/E-print archive.
1/8/2003
© Bradley Hemminger UNC-CH
Where to next? (UNC)
• Create a developmental digital archive resource at
SILS to support the submission and archival of
scholarly materials by anyone in UNC
(uncOpenArchive)
• Create a collection of digital libraries at SILS
• Help work towards the creation of a UNC-wide
production digital library hosted by the libraries.
1/8/2003
© Bradley Hemminger UNC-CH
SILS Center for
Digital Libraries
(CDL)
Collections
Botnet
DocSouth
GovStat
Minds of Carolina
NeoRef
OpenVideo
UNC Courses
uncOpenArchive
1/8/2003
Submit
Material
Search
Contact a CDL Librarian
UNC
Libraries
Davis
HSL
© Bradley Hemminger UNC-CH
School of
Information
and Library
Science
Download