Scientific Data Management Workshop Series, Part III Report of the Integrated Data Analysis and Visualization Breakout Group May 25, 2004 Outline • Where are we now? • Where do we want to be? • Sound bite vignettes and a shameless • • plug. Recommendations. Nagiza’s slides. Where Are We Now? • Single-point examples of integrated data analysis and visualization environments in certain communities. • “root” in HENP, Jim Gray’s SkyServer portal for the Sloan Digital Sky Survey. • “Capability wall” • Familiar tools do not scale to present and imminent requirements. • New capabilities needed to aid in data understanding (e.g., query-based data analysis and visualization). Where are We Now? (ctd.) • Communities go through “scatter-gather” phases. • Scatter means divergence, gather means convergence. • Scatter happens when individual PI’s and their code are in a hurry. • Gather happens when there needs to be some form of crossproject interaction, like code/data validation or data sharing. • Newer fields (endeavors) tend to have data standards. • E.g., GenBank genome sequence data format. Where Do We Want to Be? • Unrestricted exchange of data, software tools, workflows: • Within communities, and • Between communities when feasible and practical. • Community-based data model standards. • Easier for some, more difficult for others. • Scientists want the ability to use an integrated data analysis and visualization pipeline w/o the need for an expert. • Should protect existing investment in known procedures. • Integrated data analysis and visualization workflows that support: • Diverse and distributed resources. Sound Bite • “The real power came when we agreed upon a standard data API; it is hopeless to standardize on a common visualization tool.” • A common data model provided stability in “the foundation” and promoted “freedom of choice” in data analysis and visualization. • Flexibility and adaptability are the hallmarks of useful systems (e.g, the “duct tape test.”) Sound Bite(s) • “There is no force to motivate common output formats.” • “People won’t consciously decide to share data.” • Common data format/model for chemistry input to many combustion codes (a necessity), but the same codes output data in different formats. • Output format is chosen as a function of expediency, custom needs, etc. Sound Bite • “3D visualization is good for navigating through complex datasets to set up 2D quantitative data analysis.” • The human cognitive processing system is adept • at multiscale information processing. This example shows use of “best of breed” technology (or, in other words, “the right tool for the right job.”) Shameless Plug • The “Visualization Center” idea is strongly endorsed in a limited sampling of applications. • The Visualization Center consists of: • Core visualization technology focus, • Matrixed staff from applications, data management, networking, etc. • The Product is: • Integrated visualization and data analysis technologies focused on specific application areas. • Community-based data models, requirements are highly desirable to maximize impact. Recommendations • A “carrot” or “stick” will be required to lure/coerce communities towards standards. • Is funding a “carrot” or a “stick?” • Promote efforts to unify communities around • data models, which form the basis for integrated data analysis and visualization resources. Promote efforts to build community-based integrated data analysis and visualization software around these data models. Part Two…. Nagiza’s slides.