File and Metadata Management Breakout • File & Meta-data Management • Breakout Considerations

advertisement
File and Metadata Management
Breakout
• File & Meta-data Management
– Mechanisms
– Policy
– Replication
• Breakout Considerations
– Who are the early adopter/active communities?
• To gather detailed requirements
– How uniform are the requirements within the
community?
• Are there gaps? Revise emphasis?
– Targets over the next 12 months
• Longer term?
NeSC Workshop - February 2007
1/14
Reporting back
• Changes to docs
– Cross-links that are needed to identify
• Suggested actions
– Specific if we can – OMII UK could do X,
JCSR could do Y
NeSC Workshop - February 2007
2/14
Summary
NeSC Workshop - February 2007
3/14
Deliverables in document
• Automatic data annotation and provenance tools
to support domain specific schema
• Mechanisms to support controlled and
convenient sharing of files between groups
• Best practice document to support research
groups in developing their own data curation and
file management policy
• Development of common annotation schemes
for individual communities to enable consistent
metadata labeling within these communities
NeSC Workshop - February 2007
4/14
• A) SW Evaluation & report: We need to evaluate
commonly available network file systems
(GPFS, PVFS, etc) in comparison to distributed
file management tools (SRB, SRM, dCache, etc)
– What are criteria – include collaboration, deployment,
ease of use, cost, etc
– Communication with other communities esp HPC and
DCC
– workshop on available tools to follow on
• Short term
• ETF/NGS – Dave Wallom
• Targeted community- general
NeSC Workshop - February 2007
5/14
• B) Best practice document: current
metadata and annotation practices, and
possible policies – aim is that to contrib
data you have to have a policy for how
annotation and metadata will be done
(within standards for interop)?
• Medium term – could start now
• JISC could do a call – Ann Borda
• Targeted community- general
NeSC Workshop - February 2007
6/14
• C) Information: What standards are
available for successful data curation,
and DCC workshop on metadata
standards
• Medium term
• DCC with JISC – Chris R.
• Targeted community- general
NeSC Workshop - February 2007
7/14
• D) Why aren’t common tools available
(eg from the DCC) being used by the
scientists - Workshop/Outreach in this
space might be a help
• Short term
• DCC? – Chris R.
NeSC Workshop - February 2007
8/14
• E) Reporting already completed evaluations of
institutional repository systems
– Linking to a common place
– Making sure criteria are comparable, etc
– Goal: JISC/JCSR could come up with
recommendation for universities getting involved in
this space
• Short term
• JISC – Ann Borda
NeSC Workshop - February 2007
9/14
• F) Survey to understand why open source
or commercial solutions for distributed
filesystems aren’t in more common use
– Short term
– ETF? JISC? – Ann Borda
• G) Survey: Are users using data bases
and we simply don’t know it since they
didn’t mention it?
– Medium Term
– NeSC? Neil CH
NeSC Workshop - February 2007
10/14
Notes
• Note: Action items in RED
• Document adaptations in GREEN
NeSC Workshop - February 2007
11/14
Access to Data Requirements
• Solutions exist to at least pieces of the problem – but
many people didn’t know what was available
• Dave Wallom suggests that there are commercial
systems esp AFS
– Why aren’t the HEP folks using this now?
• Neil CH – do different groups have different
requirements or just seeming different requirements?
– Would a tool summary help or a workshop to tell people about it?
(DB)
– Hard to get (new) people to meetings – do they not care? Do
they not know what they need? Are the workshops being
targetted in the wrong way? Do the aps folks know this will help
them?
NeSC Workshop - February 2007
12/14
Federated repository solutions
• Isn’t in document
• What about tying data to publications?
– Didn’t come up really except in tying to grants
and grant requirements
NeSC Workshop - February 2007
13/14
Data to share
• People also want to share software
– Files are everything from software to results
NeSC Workshop - February 2007
14/14
We didn’t hear
• Files meant moving around files
– No mention of bits of files or sub pieces
• Many users assume that how you store
data IS in a file- there simply isn’t the line
that other (CS) folks would have
• Are users using data bases and we simply
don’t know it since they didn’t mention it?
– Perhaps another survey for this (dave berry
sugg)
NeSC Workshop - February 2007
15/14
Section naming
• Collaborative file management?
NeSC Workshop - February 2007
16/14
Metadata is key to sharing
• This could be emphasized better
• A guide for metadata standards might help, and roadmap
where they are going
– Data curation people and digital library people have methods to
address this – could their formalism be made into best practice
to transfer it across?
– They need to tie standard extensions back to the standard –
perhaps some way to encourage this to happen more is
needed?
• Need better tools to add metadata to the “files”
– Auto-generation (or even simple creation at source) has different
requirements depending on the type of data
– Suggestion (DW) - policy recommendation that in order to
contrib data you have to have a policy for how this will be done
(within standards for interop)?
NeSC Workshop - February 2007
17/14
Metadata 2
• Clarify: People are happy with the
metadata frameworks for general
knowledge (date stamps, etc)
NeSC Workshop - February 2007
18/14
Before lunch list
• Note: data curation means something very
different to domain scientist than to a curation
person
• Why aren’t common tools available (eg from the
DCC) being used by the scientists?
– Outreach in this space might be a help
– How do file management and curation interact?
• Different projects have different curation needs
– Make all data accessible
– Make only some of it accessible after time?
NeSC Workshop - February 2007
19/14
• 3) Dave Wallom suggests that there are commercial
systems esp AFS
• Survey to understand Why isn’t this and other commercial
solutions aren’t in more common use?
• 3) Document: Would a summary of available data tools
help or a workshop to tell people about it? (DB)
• 3) Are users using data bases and we simply don’t know
it since they didn’t mention it?
• Perhaps another survey for this (dave berry sugg)
• 1) Best practice document: policy recommendation
(based on current experimental work) that in order to
contrib data you have to have a policy for how
annotation and metadata will be done (within standards
for interop)?
• 2) Information: What standards are available for
successful data curation, and DCC workshop on
metadata standards
• 2) Why aren’t common tools available (eg from the DCC)
being used by the scientists - Workshop/Outreach in this
space might be a help
NeSC Workshop - February 2007
20/14
After lunch topics
• 1) Follow on policy
• 2) What is best practice for file
management
• 3) Description of tools
• 4) Provenance
NeSC Workshop - February 2007
21/14
What is best practice for file
management
• SW Evaluation & report: We need to
evaluate commonly available network file
systems (GPFS, PVFS, etc) in comparison
to distributed file management tools (SRB,
SRM, dCache, etc)
– What are criteria – include collaboration,
deployment, ease of use, cost, etc
– Communication with other communities esp
HPC and DCC
NeSC Workshop - February 2007
22/14
Institutional repository systems
• Reporting already completed evaluations
of institutional repository systems
– Linking to a common place
– Making sure criteria are comparable, etc
– Goal: JISC/JCSR could come up with
recommendation for universities getting
involved in this space
NeSC Workshop - February 2007
23/14
Provenance
• Define better in document – currently
means history of how data was created
• Interlinking of metadata across
experimental process
• Edit to document – add a phrase to the
effect: Once provenience data is collected
there will also be a requirement to
navigate and analyze the data
NeSC Workshop - February 2007
24/14
• See summary slides for to do’s in order
NeSC Workshop - February 2007
25/14
Download