Cubes in the Cloud - Microsoft Research

advertisement
Enabling Eco-Science Analysis with MatLab and DataCubes in the Cloud
Jayant Gupchup† and Catharine van Ingen*
Computer Science Department, The Johns Hopkins University†
Microsoft Research*
Abstract: The ecological sciences are rapidly
becoming data intensive sciences. Several groups
have been pioneering the use of databases,
datacubes, and web-services to address some of the
data handling challenges caused by the
avalanche/tsunami/flood of data. Science happens
only when the data are actually analyzed and today
that very often happens with one of the common
scientific desktop analysis tools such as Excel,
MatLab, ArcGIS, or SPlus. The challenge then is
how to connect the data in the cloud to the analysis
tool on the desktop without requiring full data
download. This article describes our prototype
connection between one such service and one such
tool. We describe how this approach can be
generalized across a number of different science
questions and tools. We also explain why this is a
good solution from a scientist’s perspective.
are tedious to do in Excel. We describe how we
have connected MatLab on the desktop over the
internet to one of our datacubes. The approach can
be generalized to connect other tools to our family
of datacubes to give our scientists a choice in
analysis tools. We believe this has implications for
other researchers exploring how to enable
ecological scientists.
1.1 The Ameriflux carbon-climate data set
The Ameriflux network [AMERIFLUX] is a
scientific collaboration of over 50 institutions across
America and operates approximately 120
measurement sites. Each site provides continuous
observations of ecosystem level exchanges of CO2,
water, energy and momentum spanning diurnal,
synoptic, seasonal, and inter-annual time scales.
Ameriflux is one of several regional networks that
together form the FLUXNET global collaboration.
1 Introduction
The combination of rapid advances in sensor
technology, remote sensing, and internet data
availability are causing a dramatic increase in the
amount and diversity of ecological science data. At
the same time, scientists are collaborating to attempt
regional or even global scale studies. These analyses
require mixing time-series data from different
sources with site property or other ancillary data
from still different sources.
Using a database to assemble and curate such data
collections have been documented in depth
elsewhere [OZER], [SDSS], [CUAHSI-ODM]. At
the Berkeley Water Center [BWC], we have been
building a number of related environmental
datasets. We describe one of these datasets and the
kinds of analyses commonly performed. We then
describe the important aspects of our databases and
datacubes.
Our focus here is on how to enable common data
analyses using tools already in use by scientists. We
chose to use MatLab for two reasons [MATLAB].
First, a number of our scientists use it. Second, we
also wanted to use it for simple visualizations which
Each Ameriflux tower site contributes 22 common
measurements to the Ameriflux archive at ORNL.
The ORNL archive works with researchers in the
sister CarboEuropeIP network to produce scienceready data products which are gap-filled, quality
assessed and contain additional computed variables.
The data are used to understand how climate
interacts with plants at a systems level to influence
carbon flux and global warming. In the past, such
studies have been primarily individual site
investigations. Today, regional and global analyses
are being attempted. At the same time, the data are
also used by other non-field scientists to provide
ground truth for climate modeling efforts and
satellite-based remote sensing data.
Carbon-climate data is similar to many other
environmental data sets in the following ways.
 The
data
has
strong
temporal
characteristics – understanding diurnal,
seasonal, long term changes and other time
variations are important to the science.
 The
data
has
important
spatial
characteristics. For example, micro-climate


is affected by latitude, longitude, and
proximity to the coast.
There are strong and weak correlations
between the observed and computed
variables.
Understanding
those
relationships such as the change in leaf
production as the result of temperature and
precipitation correlations is at the center of
the science.
Analysis of the time series data often
requires knowledge of other site
parameters such as vegetative cover or soil
composition or site disturbances such as
fires, floods, or harvests.
Similarly, the sorts of data analyses are similar to
other environmental analyses. Examples include:
 Look for trends or changes in variables
outside of the common diurnal and
seasonal fluctuations.
 Look for changes in variables after a
relatively rare event or disturbance such as
a flood or fire.
 Look for similarities and differences in
variables across
sites
of similar
characteristics such as tropical rainforests).
 Integrate with maps.
These characteristics are very common to other
ecological
sciences
such
as
hydrology,
oceanography, and meteorology.
Figure 1. Example environmental datacube
dimension. Common dimensions include what
(datumtype and exdatumtype), where (site and
offset), when (timeline), which (WorkingGroup),
and how (quality).
As shown in Figure 1, our cubes have five common
dimensions:

What or datumtype and exdatumtype:
measurement variable such as precipitation
or latent heat flux. Because of the large
number of variables we handle, we
sometimes parse the variable as a primary
variable (datumtype) and one or more
extensions (exdatumtype). Most analyses
need only the primary variable.

Where or site and offset: (x, y, z) location
where (x,y) is the site location and (z) is
the vertical elevation at the site. The site
dimension also surfaces important site
attributes such as climate classification or
vegetative cover to allow locations to be
grouped
or
filtered
along
those
characteristics. The site dimension includes
hierarchies such as latitude band which
enables drilldown as well as grouping.

When or time: timeline. This dimension
allows aggregation across different time
granularities such as day of year or hour of
day. We also build a number of hierarchies
to enable drilldown in time from decade to
year through to minute. Some of our cubes
1.2 Scientific Datacubes
The what-where-when nature of time series data
drives much of our databases schema and datacube
dimensions. A datacube is a database specifically
optimized for data mining or OLAP [GRAY1996],
[SSAS]. Datacube abstractions include:

Simple aggregations such as sum,
minimum, or maximum can be precomputed for speed

Hierarchies such as year to day of year to
hour of day can be defined for simple
filtering with drilldown capability

Additional calculations such as median or
standard deviation can be computed
dynamically or pre-computed

All operate along dimensions such as time,
site, or measurement variable
Datacubes can be constructed from relational
databases using commercial tools.
also include science-specific time attributes
and hierarchies such as water-year or
MODIS-week.

Which or working group or dataset: data
versioning or other collections such as “all
Boreal forest sites” or “real-time data”
useful for analyses. As shown in Figure 1,
this is a many-to-many dimension – a
given site can be a member of multiple
datasets.

How or quality: This dimension varies the
most across our cubes, although all have
some notion of data quality. This may
include spike detection, gap-filling, or
other data “goodness” metric.
We’ve been including a few computed members in
addition to the usual count, sum, minimum and
maximum

hasDataRatio: fraction of data present
across time and/or variables. This measure
includes both orginal data and any gapfilled data.

DailyCalc: average, sum or maximum
depending on variable and includes units
conversion

YearlyCalc: similar to DailyCalc

RMS or sigma: standard deviation or
variance for fast error or spread viewing

gapPercent: percentage of contributing
data that is either missing or has been gapfilled.
Datacubes are queried by the multidimensional
query language MDX [MDX], [MDXTutorial].
MDX is similar to the SQL query language but has
some prominent differences. A SQL returns an
array; each column relates to one query element
(e.g. time, datumtype, site). An MDX query returns
a matrix with a notion of a column axis and a row
axis; each cell relates to two or more elements. Each
axis can contain one or more dimensions or
attributes. Thus each axis can be viewed as a join of
all the dimensions on that axis.
1.3 Datacube Clients
In recent years, one sees a considerable growth in
the attention given to simple access to datacubes.
Most of these tools are GUI-based and intended for
business applications. Tableau [TAB], Proclarity
[PRO], and Cognos [COG] are three such business
application software applications which provide a
GUI and additional analysis features.
At present, the most common way of accessing a
datacube is the Excel PivotTable [EXCEL]. Excel
PivotTables allow you to set up a connection with
the datacube and then browse and select the data
using a drag and drop type mechanism. The MDX
query is generated by Excel and passed over an
OLEDB connection.
Figure 2 shows how MDX queries are rendered in
COLUMN AXIS : Data
DIMENSIONS :
variables and sites
SELECT
NON EMPTY CROSSJOIN
(
{[Datumtype].[Datumtype].[Datumtype]},
{[Site].[IGBPClass].[IGBPClass]}
)
DIMENSION PROPERTIES
PARENT_UNIQUE_NAME ON COLUMNS,
Aggregate Measures
ROW AXIS : Time
DIMENSIONS : Year, day
NON EMPTY CROSSJOIN
(
{[Timeline].[Year].&[2003]},
{[Timeline].[day].[day]}
)
DIMENSION PROPERTIES
PARENT_UNIQUE_NAME ON ROWS
FROM [LatestAmfluxL3L4Daily]
WHERE ([Measures].[Average])
Figure 2: Rendering of an MDX query in Excel. The various fragments of the query and the rendering
are marked in the same box style (background color and font) to make it easier to identify the mapping.
Excel. The PivotTable columns correspond to the
MDX query columns; the PivotTable rows
correspond to the MDX query rows. The returned
measures populate the PivotTable array.
Despite the ease Excel has a number of associated
restrictions from a scientist’s view-point.
 Excel PivotTables have limited plotting
capabilities. To make a scatter plot, you
must cut-paste the data from the
PivotTable thereby losing the ability to
update the data via query.
 Excel does not have a scripting feature.
Scientists often make collections of very
similar graphs for example to look at
different variables across sites. To graph
each column in a returned PivotTable
requires a lot of tedious select-cut-paste.
 While Excel includes some scientific
libraries such as histograms or Fourier
transforms, the selection is not as wide as
tools intended for scientists such as
MatLab. The libraries are also not well
integrated with PivotTables again likely
leading to fragile cut-paste.
These same limitations apply to the above
commercial tools as well. These tools also suffer
from the difference between scientific graphics and
business graphics – the colors, shapes and axes
labeling are foreign to scientists. Familiarity is
important to scientists. At a minimum, the
difference means that the plot must be repeated with
another tool prior to publication.
Our preliminary survey suggests that pairing a
datacube with a rich scientific client application
should offer the best of both. The datacube provides
simple slice and dice to aggregates; the rich client
provides scripting, familiar graphics and powerful
analysis libraries.
2 System Overview
The components of our solution are shown in Figure
3. This section explains each as well as identifying
those which can be reused with other clients or
datacube structures.
MatLab
Results
Object
Command
Column
Index
Handles and
Column Names
GUI selections
GUI Builder
MDX Field Picker
Handle Manager
Fields and
Filters
Results Object
Query Builder
Menu
Config
Cube
Config
Credentials and
MDX Query
Deserializer
HTTP
Serialized
Results
Web Server
Authenticated
Credentials and
MDX Query
ADO MD
Serialized
Results
Serializer
Input MDX
Output Results
Figure 3 : System Architecture
2.1 GUI Builder
The GUI Builder allows the scientist to select the
dimension attributes, hierarchy levels, and measures
to be retrieved for inclusion in the analysis.
As shown in Figure 4, the GUI is divided into 2
major panes. The Field Axis acts as the column axis
whereas the Time Axis acts as the row axis. The
Field Axis supplies the what-where-which; the Time
Axis supplies the when.
 “What” is determined by the “Select
Datum” box. In Figure 4, “LE” (latent heat
flux) is selected. Multiple datums can be
selected by control-clicking.

“Where” is chosen with the “Select
Groupings” box and the associated “Drill
down sites” check box. Latitude and
longitude bands are common selection
criteria. If the drill down sites check box is
selected, data are returned for each site
within the latitude bands; if the check box
is not selected, data are aggregated across
the band.

“Which” is chosen with the “Dataset” box
and the associated “Use dataset filter”
check box.
By default, all data are
included in the returned aggregate. If the
check box is selected, only the selected


highlighted datasets are included in the
query.
“When” is selected in the “Time Axis”
pane. The time range is selected by the
start and stop years. The hierarchy to be
used and the depth of the hierarchy to be
traversed are selected in the Select Time
box.
The data aggregate is chosen in the “Select
Measure” box.
Note that the interface does not support setting a
filter on a date-time window. This is a limitation of
MDX. There is no construct that allows
specification of months 04-12 for 1999, and 08-12
for 2006 and full months for the years in between.
We chose to set a filter at the year granularity.
The contents of each menu are determined by
configuration files. For example, each entry in the
“datum.txt” file is entered on a new line, and each
entry is of the format <alias, MDX representation>.
The aliases are shown in the GUI box and the MDX
representations are used by the MDX Field picker to
create the lists that are passed to the Query Builder
module. As an illustration, the entry for the LE
entry shown in Figure 4 looks like:
LE,[Datumtype].[Datumtype].&[11]
Figure 4. GUI Builder. The GUI exposed by the GUI builder is used to select the what-where-which-why.
2.4 Serializer (Cube Access)
Note also that the prototype does not include the
“how” or quality dimension.
2.1 MDX Field Picker
The MDX field picker module provides the primary
MatLab programming interface. The field picker
invokes the GUI Builder, passes the obtained user
selections to the Query Builder, and returns the
results object. To invoke the GUI to make
selections, the MDX field picker is invoked by:
[v1 v2 v3 res] =
MdxFieldPicker();
where v1 v2 v3 are GUI variables and res is the
returned results object.
After the query parameters are selected, the user hits
submit to exit the GUI and the above call returns.
To retrieve the results, the field picker is invoked a
second time:
[v1 v2 v3 res] =
MdxFieldPicker(
'MdxFieldPickerOutputFcn',
v1,v2,v3);
2.3 Query Builder
The Query Builder module builds the MDX query
based on the GUI Builder selections. The Query
Builder module accepts as input:
 List of groups (sites) and datums
 Time hierarchies (Year – day etc)
 Filters: time range filter and dataset-filter
 Variable Measure(s)
The selected datum(s) and groups(s) form column
axis. The Query Builder looks at the number of
dimensions needed and then cross-joins dimensions
as necessary. Similarly, the row axis is generated
from the time range and hierarchies. The SELECT
clause is then constructed by combining the row
axis MDX and the column axis MDX. The FROM
clause is specified in the cube configuration file.
The measures and dataset filters are used to generate
the WHERE clause. Finally, the clauses are
combined to complete the MDX query.
Our prototype Query Builder can generate queries
where each of axis can have up to 3 dimensions.
This was chosen for coding simplicity and
accommodates our family of related eco-datacubes.
The Query Builder invokes the ASP-based web
service Serializer by http post. The Serializer
unpacks the post, passes the query to the datacube
and then produces a results stream. An example
post is below.
http://<xxxx>/mdxconnect/Default.aspx?db=Latest
AmfluxL3L4Daily&mdx=SELECT%20%20NON%
20EMPTY%20CROSSJOIN%20({[Datumtype].[D
atumtype].%26[1],[Datumtype].[Datumtype].%26[1
9]},{[Site].[Site].&[477]})%20%20DIMENSION%
20PROPERTIES%20PARENT_UNIQUE_NAME
%20ON%20COLUMNS,%20%20NON%20EMPT
Y%20{[Timeline].[Year].%26[1990]:[Timeline].[Y
ear].%26[2006]}%20%20DIMENSION%20PROPE
RTIES%20PARENT_UNIQUE_NAME%20ON%2
0ROWS%20%20FROM%20%20LatestAmfluxL3L
4Daily%20%20WHERE%20%20([Measures].[Year
lyCalc])
The natural question is why does one need to do
produce results as a stream? Excel and other ADO
[ADO] compatible applications can talk to the cube
using the OLE DB (ADO MD) drivers. The OLE
DB driver maintains the relationship of each
returned data cell with the associated 2 or more
dimensions. After much investigation, we found that
no such driver exists for environments that cannot
handle ADO objects; MatLab is one such
environment. In order to solve this problem, we
made use of the underlying structure of an MDX
result: we serialize the results in a manner that can
be reconstructed at the client end. We:
 convert the query results into a stream
using the ADO MD driver
 convert that stream to a text stream
 pass that text stream over the internet
 reconstruct the stream to an object that
maintains the cell-dimension(s) association
on the client.
The organization of the stream is as follows. The
first 2 numbers represent the number of rows and
columns. This is followed by the number of
dimensions on the column axis followed by the
actual column dimension attributes. Based on the
number of columns and number of dimensions on
the column axis, we can write all the columndimension attributes. Next we write the number of
dimensions on the row axis followed by the rowdimension attributes. Again, as done with columns,
by combining the information of number of rows
and number of dimensions on the row axis, we can
write in the row-dimension attributes. After reading
the dimensions on the row and column axis, we
write the data matrix [row X col].
As an illustration, consider the results of the MDX
query.
SELECT
NON EMPTY CROSSJOIN
(
{[Datumtype].[Datumtype].&[11
],[Datumtype].[Datumtype].&[19]},
{[Site].[SiteID].&[477],[Site
].[SiteID].&[480]}
)
DIMENSION PROPERTIES
PARENT_UNIQUE_NAME ON COLUMNS,
NON EMPTY
{[Timeline].[Year].&[2000]:[T
imeline].[Year].&[2006]}
DIMENSION PROPERTIES
PARENT_UNIQUE_NAME ON ROWS
FROM
LatestAmfluxL3L4Daily
WHERE
([Measures].[YearlyCalc])
The result of this query is as follows:
6,4,2,LE,US-Ton,LE,USVar,Precip,US-Ton,Precip,US-Var,
1,2001,2002,2003,2004,2005,2006,
46.1471300802596,NaN,NaN,14.0221343
47994,NaN,38.6128757119495,NaN,33.8
220144215576,NaN,NaN,NaN,81.4135755
203902,87.5986066925887,44.60779285
24156,267.782040508823,116.88883878
2413,267.167004732928,295.245106825
869 ...
For convenience, the numbers that tell us about the
dimensions are in bold, the dimensions themselves
are in italics and the data are underlined. The first
two numbers tell us the number of rows and
columns in the result. Thus the number of rows is 6
and the number of columns is 4. The next number
(third number) tells us the number of dimensions in
the column axis. In this example, we have 2
dimensions on the col. Axis, and 4 columns,
therefore we must have 2*4 = 8 attributes on the
column dimension. The row axis follows this; with
only one dimension and 6 rows, there are 6
attributes. Lastly the data [Row X Col] are written.
Access to the Serializer access is secured with
HTTP Basic Authentication [HTTP] and a dedicated
machine-local no-login account. The Serializer then
access
the
datacube
using
the
NT
AUTHORITY\NETWORK SERVICE account. We
realize that basic authentication is not a long term
solution as the credentials are encoded as Base64 in
clear text and can be decoded quite easily [BASIC].
This does, however, demonstrate that some level of
security can be achieved. Basic authentication also
prevents web-crawlers and robots from accessing
the data and over-loading the system.
2.6 Deserializer
The results stream is deserialized by back tracing
the serializing mechanism. We construct a MatLab
object that associates the cells with the dimensions.
The pseudo-class is represented as follows:
Struct MdxResults
{
Integer: Number of
Integer: Number of
Integer: Number of
on Col axis
Integer: Number of
on Row axis
Struct
Axis[2]
structure
Double[,] : Data
}
Rows
Cols
dimensions
dimensions
:
Axis
Struct Axis
{
String [Number of Dimensions
in axis][Number of attributes in
each dimension] : Header
}
The MatLab object, res, that implements the above
results structure is shown below:
res =
rows: 27
cols: 37
dim1axes: 2
dim2axes: 1
axis: [1x2 struct]
data: [27x37 double]
The MatLab user has access to this structure and the
query results, without having to construct the MDX
query.
A typical MDX result contains many dimensions
and attributes associated with those dimensions. As
such, we need mechanism that enables the MatLab
user to make the column-attribute association using
some form of a search. The Handle Manager is that
mechanism.
2.7 Handle Manager
The Handle Manager associates the datacube
dimensions and attributes with the returned results
columns. The Handle Manager is invoked by:
Hm =
handle_manager(res.axis(1).dim)
Consider an MDX query with 3 dimensions on the
column axis each of which has 10 attributes
associates. The total number of columns in the
result set will be 10X10X10 = 1000 columns. The
Handle Manager provides a MatLab user friendly
way to find the right two columns for a scatter plot.
The prototype Handle Manager provides two
mechanisms to make the association. The user can:
 provide the column number (index) and get
back the fully qualified name of the
column by concatenating the attributes
along different dimensions.
 provide the attribute names and obtain the
indices (or handles) at which those
attributes are found. The name can be
either partially or fully qualified.
To further illustrate this point, consider Figure 5 to
be the output of a small, simple MDX query. There
are two site (US-Ton and US-Var) and two
datumtypes (LE and Precip).
Figure 5: Sample Result set of an MDX query.
Yearly values of two datumtypes (LE and Precip) are
returned for two sites (US-Ton and US-Var) for the
years 2001 through 2006.
To discover the contents of column 3, the user can
retrieve the fully qualified column name “USVar_LE”.
Header = get_header(hm,3)
The user can also retrieve the columns with names
containing “US-Var” (columns 3 and 4), “LE”
(columns 1 and 3) or “US-Var” and “LE” (column
3).
index = find_dims(hm,'US-Var','LE')
3 Conclusions
We have had a great deal of interest in our prototype
from our colleague scientists. We are still very early
in getting experience using the connection. One
unexpected benefit is that many of our scientists
have non-Windows desktops. Macintosh Excel
PivotTables does not support datacube access, so
MatLab is the most accessible access.
The scripting facility and improved rendering
facility is already helping us. A collection of plots
from one of our Russian River hydrology cubes is
shown in Figure 6 on the page following. The upper
right pane is a simple time plot of two variables
(discharge and turbidity). The upper left pane show
the results of an FFT (Fast Fourier Transform). This
can be done with Excel, but requires careful cutpaste which is not updated across PivotTable
changes. The lower pane shows a color-coded plot
of discharge as a function of site (aggregated by the
drainage area property) in 2003.This sort of plot is
often used by our scientists and is not possible with
Excel.
Our solution is also faster than Excel over
potentially slow lines to a scientist desktop, Excel
uses a SOAP-based approach; the XML headers
make the result bulkier than our text-based
approach.
As the amount of data returned by the query gets
large, the performance can become sluggish. This is
a combination of the time necessary to retrieve the
data the network transport time, and the scaling of
MatLab when handling large amounts of data. The
good news is that the datacube approach can
postpone that slowdown when the analysis is not at
the leaf nodes of the hierarchies. The datacube can
precompute the aggregate and only those aggregates
need to be passed to the desktop application and
handled by that application.
Of course, we are describing only a prototype. Our
query generator cannot handle more that 3
dimensions on an axis. Thus, the maximum number
of dimensions that the query generator can accept is
6 (3 on column axis and 3 on row axis). This is not a
limitation for our cubes, but could be in the future.
Figure 6: Example MatLab generated plots from our Russian River cube. The lower color coded plot of
discharge in 2003 is not possible to create with Excel.
We have also not attempted to include the (very
widely varying) quality dimension. Lastly, we are
using only basic authentication.
JOINs; we have demonstrated feasibility and
correctness, but not optimal coding.
5 Acknowledgements
4
Future Work
Near term, we want to convert the prototype to an
easily to deploy technology artifact. We need to add
support for selecting a datacube including
specifying credentials, menu configuration files and
would like to move to HTTPS [HTTPS].
Our scientists have asked for a command line
interface in addition to the GUI. They have also
suggested returning an n-dimensional array rather
than using the Handle Manager; that would be more
intuitive to them.
We need to consider how to abstract the differing
quality dimensions across our data sets; this is much
more of a user model than GUI or query generation
question. Lastly, we have some performance testing
to do on our generated queries given the CROSS
We would like to acknowledge the valuable
contributions made by Deb Agarwal, Monte Goode,
Matt Rodriguez, and Robin Weber of the Berkeley
Water Center, in getting the data ready and testing
various modules during our development and
deployment. We would also like to thank Rebecca
Leonardson our first user for many terrific
suggestions. As always, we rely on Stuart Ozer for
his continued datacube wisdom.
6 References
[ADO]: ActiveX Data Objects (ADO), a languageneutral object model that expose data raised by an
underlying
OLE
DB
Provider,
http://support.microsoft.com/kb/183606
[AMERIFLUX]:
AmeriFlux
http://public.ornl.gov/ameriflux/
Network,
[BASIC]: SSL Man-in-the-Middle Attacks, Peter
Burkholder,
February
1,
2002,
http://www.sans.org/reading_room/whitepapers/thre
ats/480.php
[BWC]
Berkeley
http://esd.lbl.gov/BWC/.
Water
Center
[COG]:
Cognos,
http://www.cognos.com/solutions/index.html
[CUAHSI] Consortium Consortium of Universities
for the Advancement of Hydrologic Science,
Observations Data Model,
http://www.cuahsi.org/his/odm.html
[EXCEL]:
Excel
Pivot
tables,
http://www.microsoft.com/dynamics/using/excel_pi
vot_tables_collins.mspx
[GRAY1996] J. Gray, A. Bosworth, A. Layman,
and H. Pirahesh, “Data cube: A relational operator
generalizing group-by, crosstab and sub-totals,”
ICDE 1996, pages 152–159, 1996.
[HTTP]: J. Franks et al. HTTP Authentication:
Basic and Digest Access Authentication, June 1999.
IETF RFC.
[HTTPS]:
HTTPS,
http://technet2.microsoft.com/windowsserver/en/libr
ary/052d2ea9-586c-4e33-9c56ecc0c2b203be1033.mspx?mfr=true
[MATLAB] The language of technical computing,
http://www.mathworks.com/products/MatLab/
[MDX]: Multi Dimensional eXpressions (MDX), a
query language to query the SQL Server Analysis
Services (SSAS), http://msdn2.microsoft.com/enus/library/ms345116.aspx
[MDXtutorial]:
MDX
http://msdn2.microsoft.com/enus/library/ms144884.aspx
Tutorial,
[OZER] Stuart Ozer, Alex Szalay, Katalin Szlavecz,
Andreas Terzis, Razvan Musǎloiu-E., Joshua
Cogan, Using Data-Cubes in Science: an Example
from Environmental Monitoring of the Soil
Ecosystem, MSR-TR-2006-134, 2006.
[PRO]: Proclarity, http://www.proclarity.com
[SSAS] SQL Server Analysis Server, An integrated
view of business data for reporting, OLAP analysis,
Key Performance Indicator (KPI) scorecards, and
data
mining,
http://www.microsoft.com/sql/technologies/analysis
/default.mspx
[SDSS] The Sloan Digital Sky Survey SkyServer,
http://skyserver.sdss.org/
[TAB]: Tableau, A tool for querying and analyzing
OLAP databases without any knowledge of MDX,
http://www.tableausoftware.com/info/OLAP_Front_
End/OLAP_Front_End_fw.php
Download