Scientific Data Management Workshop Series, Part III Report of the

advertisement
Scientific Data Management
Workshop Series, Part III
Report of the
Integrated Data Analysis and Visualization
Breakout Group
May 25, 2004
Outline
• Where are we now?
• Where do we want to be?
• Sound bite vignettes and a shameless
•
•
plug.
Recommendations.
Nagiza’s slides.
Where Are We Now?
• Single-point examples of integrated data
analysis and visualization environments in
certain communities.
• “root” in HENP, Jim Gray’s SkyServer portal for the Sloan
Digital Sky Survey.
• “Capability wall”
• Familiar tools do not scale to present and imminent
requirements.
• New capabilities needed to aid in data understanding (e.g.,
query-based data analysis and visualization).
Where are We Now? (ctd.)
• Communities go through “scatter-gather”
phases.
• Scatter means divergence, gather means convergence.
• Scatter happens when individual PI’s and their code are in a
hurry.
• Gather happens when there needs to be some form of crossproject interaction, like code/data validation or data sharing.
• Newer fields (endeavors) tend to have data
standards.
• E.g., GenBank genome sequence data format.
Where Do We Want to Be?
• Unrestricted exchange of data, software tools, workflows:
• Within communities, and
• Between communities when feasible and practical.
• Community-based data model standards.
• Easier for some, more difficult for others.
• Scientists want the ability to use an integrated data
analysis and visualization pipeline w/o the need for an
expert.
• Should protect existing investment in known procedures.
• Integrated data analysis and visualization workflows that
support:
• Diverse and distributed resources.
Sound Bite
• “The real power came when we agreed upon a
standard data API; it is hopeless to standardize
on a common visualization tool.”
• A common data model provided stability in “the
foundation” and promoted “freedom of choice”
in data analysis and visualization.
• Flexibility and adaptability are the hallmarks of
useful systems (e.g, the “duct tape test.”)
Sound Bite(s)
• “There is no force to motivate common output
formats.”
• “People won’t consciously decide to share
data.”
• Common data format/model for chemistry input
to many combustion codes (a necessity), but the
same codes output data in different formats.
• Output format is chosen as a function of
expediency, custom needs, etc.
Sound Bite
• “3D visualization is good for navigating through
complex datasets to set up 2D quantitative data
analysis.”
• The human cognitive processing system is adept
•
at multiscale information processing.
This example shows use of “best of breed”
technology (or, in other words, “the right tool for
the right job.”)
Shameless Plug
• The “Visualization Center” idea is strongly endorsed in a
limited sampling of applications.
• The Visualization Center consists of:
• Core visualization technology focus,
• Matrixed staff from applications, data management, networking, etc.
• The Product is:
• Integrated visualization and data analysis technologies focused on
specific application areas.
• Community-based data models, requirements are highly
desirable to maximize impact.
Recommendations
• A “carrot” or “stick” will be required to
lure/coerce communities towards standards.
• Is funding a “carrot” or a “stick?”
• Promote efforts to unify communities around
•
data models, which form the basis for integrated
data analysis and visualization resources.
Promote efforts to build community-based
integrated data analysis and visualization
software around these data models.
Part Two….
Nagiza’s slides.
Download