Service-Oriented Local And Global Visualization with Sorting On-demand for Climate Data Zhe Zhang Ye Jin Xusheng Xiao zzhang13@ncsu.edu yjin6@ncsu.edu xxiao2@ncsu.edu Solution 1: Service-Oriented Histogram [2] Motivation 15 10 5 0 10 12 14 16 18 20 x-value y-counts 0 50 10 20 30 40 100 y-counts 150 y-counts 20 25 200 Huge amount of climate simulation data are collected from different Locally And Global Visualization Locally compute min, max, and count areas (e.g., cities, countries). Transmitting the local min, max and count to compute global min, Climate scientists keep trying to predict the trends of the variation of max and count climate both locally and globally. Each data sources compute the histogram based on the global min, Exploring visualization of data mining (e.g., histogram) has been used max and count more and more frequently to get a general view ahead of predicting. Only transferring the computed histogram data, which is much Climate experts would like to analyze data by navigating among levels smaller compared to all the climate data of data ranging from the most summarized (drill-up) to the most detailed Merge the transmitted histograms to show the global histograms (drill-down) (e.g., drill-down shown in Figure 1). -40 -35 -30 -25 -20 -15 -10 6 5 35 40 45 80 30 50 15 -30 -20 -10 0 10 20 30 40 10 0 y-counts 20 25 0 30 20 40 y-counts 60 x-value 20 x-value 3 4 3 y-counts 2 2 1 1 0 0 -1 40 -2 y-counts -3 60 0 80 x-value 0 5 x-value -10 -8 -6 -4 -2 0 x-value -1.0 -0.5 0.0 0.5 1.0 15 x-value 0 5 y-counts 10 Figure 1: Drill-down to interval [-1,1] -20 -18 -16 -14 -12 -10 10 x-value 6 4 0 2 y-counts 8 Challenge -35 -30 -25 -20 -15 -10 x-value Globally transferring caused problems: Time-consuming (see Table 1) Package Lost during data transfer (see Table 1) Frequently drill-up and drill-down navigation of data consumes computation resources. (e.g., scanning same data set multiple times see Table 2) Table 1 [1] Here are the raw data in multiple domains have already collected, we can see the latest data sets are all for year 2008. Data Domain Single data set size Number of data sets Total Size Collecting Time In Best Case VOCALS 2008 ~70000 KB 56 ~3920 MB ~10 Hrs ASCOS 2008 ~140000 KB 25 ~3500 MB ~10 Hrs AEROSE 2008 ~80000 KB 36 ~2880 MB ~7 Hrs STRATUS 2007 ~70000KB 21 ~1470 MB ~5 Hrs Figure 2: System Framework Solution 2: On-demand Sorting [3] Cache data and parameters (min, max, count) locally Index data with break number (e.g., 0.5 is in the break [0, 1] ) Check whether the data in the requested breaks are sorted or not If sorted, transfer data directly If data is not sorted, sort only the data in the corresponding break and mark the break as sorted Transfer local histogram data (min, max, count) for global computation Merge data from different sources Result Table 2 Total time needed to discovery meaningful or user specified parameters visualization results, we need to speed up those visualization algorithms. Data Size Run Once Histogram Discovery Histogram Run log(n) Times User specified 30 Times ~1500 MB 2 Mins ~17 * 2 = 34 Mins 60 Mins ~3000 MB 4 Mins ~18 * 2 = 36 Mins 120 Mins ~4500 MB 6 Mins ~19 * 2 =38 Mins 180 Mins http://csc.ncsu.edu/ NCSU Computer Science References 1. http://www.esrl.noaa.gov/psd/psd3/cruises/ 2. Felix Halim, Panagiotis Karras, and Roland H.C. Yap. 2009. Fast and effective histogram construction. ACM, New York, NY, USA, 1167-1176. 3. C. A. R. Hoare. Quicksort. The Computer Journal, 5(1):10–16, January 1962.