GVN330 Climate Data Analysis Assignment 2.2: Boxplots

advertisement
GVN330 Climate Data Analysis
Assignment 2.2: Boxplots ("schematic plots")
In Assignment 2.1 you created a graph showing monthly values for the summary
statistics:
 Median
 Upper and lower quartiles
 Minimum and maximum
These values are exactly the values used in Tukey's original boxplot (from your
reading)! So, you have manually created a traditional boxplot, just plotted using
lines instead of boxes.
These plots summarize the maximum and minimum temperature well, but you
can see with precipitation that the maximum values are much higher than the
upper quartile. This is exactly the problem described by Wilks:
"…information about the tails of the distribution is highly generalized. The
whiskers extend to the highest and lowest values, but there is no information about
the distribution of data points within the upper and lower quartiles of the data"
One solution to this is to use a "schematic plot". Here. we use the matlab function
boxplot to create a schematic plot.
Note that the term "boxplot" is commonly used in statistics to refer to a what
Wilks calls a "schematic plot" - that is, it is assumed that unusual points will be
shown individually. This is the case in Matlab, where the boxplot function makes
a plot similar to what Wilks calls a "schematic plot", but using only the "inner
fences" (that is, all points considered outliers from the distribution are shown
using the same symbol).
Reading
The following sections from Wilks Chap 3 Grahpical Summary Devices.
3.3.2 Boxplots
3.3.3 Schematic Plots
Exercises:
Assignment2.2 – Create schematic plots ("boxplots") for the monthly maximum
and minimum temperatures and precipitation data you created in Ass1.7/2.1.
a) Loading data into Matlab
It is useful to know that the matlab function boxplot can take a matrix of inputs
and create a boxplots for each column, in a similar way to how median calculates
the median of each column.
So, if you arranged your data into a matrix such as:
P=
P for Jan
1961
P for Jan
1962
P for Jan
1963
…
P forFeb
1961
P for Feb
1962
P for Feb
1963
P for Mar
1961
P for Mar
1962
P for Mar
1963
…
…
…
in Ass2.1, then you will probably want to re-use your work from Ass2.1 here!
You can either re-use the code, or save the Nx12 matrices you created for
precipitation, max temp and min temp. See the discussion on loading data in
Ass2.1; the same principles apply here.
b) Making the boxplots.
Use the function boxplot to create your schematic plots.
The only complication is that you need to specify the x-labels using a cell-array,
not using set(gca,'xticklabel'). I do not understand why it does not work with
xticklabel, but we just have to live with it. A cell-array is an advanced data-type,
like the structure we encountered earlier. I am not going to ask you any
questions about cell-arrays in the exam, you do not need to know anything more
about them!
Use the following syntax to create a box-plot with month labels (note the use of
curly-braces { } ):
>> C={'J','F','M','A','M','J','J','A','S','O','N','D'};
>> boxplot(P,'labels',C)
and you can then use xlabel, ylabel etc as normal.
You may find that the function grid is useful for answering the questions below.
Discussion:
This style of boxplot is commonly used, across all scientific research fields, to
show and compare distributions of data.
For the exam, you will need to know how to interpret a box-plot. You should be
able to answer question such as (I am not going to mark your answers! Some of
the months are too similar to be exactly sure which has the highest or lowest
value for in the following questions. Just be sure you know how to answer such
questions from a boxplot!)
 Which month/months has the highest median precipitation? Which have
the lowest?
 Which month/s have the highest inter-quartile range?




Which month out of January or February has recorded the highest
precipitation?
Which month out of Jan or Feb has the largest range of precipitation
(range is max value minus min value)?
Are the monthly precipitation distributions generally positively skewed
or negatively skewed?
Does the boxplot indicate that there are errors in the recorded
precipitation?
Compare your answers with others!
Download