The summary statistics are

advertisement
GVN330 Climate Data Analysis
Assignment 2.1: Numerical Summary statistics
In Assignment 1.5/1.6 you created graphs of average temperature and monthly
precipitation. These graph show clearly the average seasonal cycle, but they do
not indicate the degree of year-to-year variability.
In Assignment 1.7 you created graphs showing the monthly time-series. These
graphs show the year-to-year variability, but they were difficult to interpret.
With the temperature graphs, the seasonal cycle dominates, so it is difficult to
see the year-to-year variability (esp in Spring and Fall, it is a bit easier with
Summer and Winter). With the precipitation graph, there is so much scatter that
it is difficult to even pick out the annual cycle!
To try to visualize both the annual cycle and the year-to-year variation, we can
calculate and plot numerical summary statistics for each month. The mean is one
such summary statistic; you plotted this in Ass1.5/1.6. The advantage of plotting
mean monthly values is that – as you discovered in Ass1.5 - you do not need to
first pre-calculate averages (or sums) for each month of each year from daily
values.
But now you know how to calculate mean temperature and sum-of-precipitation
for each month of each year (Ass1.7)! so we can create a more sophisticated
graph using other summary statistics.
Reading
Read Chapters 3.1 – 3.3 in Wilks
Background Theory Topics:
For the exam, you will need to be prepared to discuss the concepts of:
Robustness and Resistance
Quantiles and Percentiles
Concepts of Location (Central Tendency), Spread, and Symmetry and common
measure of them: mean, trimmed mean, percentiles, median, standard-deviation,
range, inter-quartile range, skewness.
Exercises:
Assignment2.1 – calculate and then plot numerical summary statistics for each
month-of-year, for max temp, min temp, and precipitation, using the monthly data
you calculated from Saeve_daily.csv from 1961-2002.
The summary statistics are:
- the median
- upper-and-lower quartiles
- maximum and minimum values
For example, you have calculated monthly average max temp for Jan 1961, Jan
1962, Jan 1963 …. in Ass1.7. You need to calculate the medians of those January
values. And the quartiles, and the max and min values. Then the same for the
other months. Then the same for minimum temperature and precipitation!
You should make one plot for each variable (ie one plot for max temp, one for
min temp, one for precipitation). All the statistical measures you calculate should
be on the plot.
I SUGGEST THAT YOU READ THROUGH THIS DOCUMENT ENTIRELY BEFORE
STARTING!
Break the problem up into smaller steps.
Suggested steps are:
a) Loading data into Matlab
In Ass1.7, you calculated monthly averages and sums from daily values. Here you
need to calculate statistics on those monthly values. Obviously, you want to reuse the work you did there for this assignment!
One way to do this is to re-use the code, coping the lines that load the data and
create the averages/sums (not the lines that make the plot) it into a new m-files.
This is fine.
Another way to do it is to run the code from Ass1.7, and then save the data you
calculated as a mat-file (a matlab data file). For example, if during Ass1.7 you
calculated a matrix M with columns:
M =[year, month, sum precipitation, mean max_temp, mean min_temp];
then you can save this matrix to the mat-file monthly_values.mat using
>> save monthly_values.mat M
Then in your code for Assignment2.1, you can re-load that data using
>> load monthly_values.mat
(you may need to copy the file monthly_values.mat to the same directory where
your assignment2_1.m code lives, or change the Current Folder)
Checkpoint – compare your approach with other groups! Why did you choose
your method?
b) Choose one variable to begin with, and calculate the statistics
The functions you need to use are median, prctile, max and min.
In this assignment, you need to make lots of calculations. In particular, you need
to repeat calculations for each month-of-year.
There are two basic ways to do this.
Firstly, you could write a for loop which selects all the eg. Januaries, then
calculates the median, quartiles etc on those. You have the knowledge to do this
from Ass1.7
Alternatively, you can take advantage of the fact that you can use functions like
median on a matrix of values, and matlab calculates the median for each column
independently.
Thus, if you first create a matrix P of eg monthly precipitation values, where
there is a row for each year and a column for each month:
P=
Jan 1961
Jan 1962
Jan 1963
…
Feb 1961
Feb 1962
Feb 1963
Mar 1961
Mar 1962
Mar 1963
…
…
…
Then you can calculate the summary statistics for each month using:
>> Pr_median = median(P);
>> Pr_Q75 = prctile(P,75);
>> Pr_max = max(P);
…
How might you create the matrix P?
You could re-code you work from Ass1.7 to create it from scratch.
Or you could use the reshape function, to convert a vector of precipitation timeseries into a Nx12 matrix. You might need to create a 12xN matrix first, then use
the transpose operator (') to turn it into Nx12.
(see http://www.cyclismo.org/tutorial/matlab/vector.html for an explanation of
transpose, but note that you can also transpose a matrix, not just a vector)
CHECKPOINT! Compare the median values for a few of the months with other
groups!
c) Plot the summary statistics.
Make one plot for each variable (ie one plot for max temp, one for min temp, one
for precipitation).
The x-axis should be months (12 values). On each plot, show on the y-axis
- the median
- upper-and-lower quartiles
- maximum and minimum values
It is up to you what style you will use to represent the data! But do not spend too
much time styling the graphs (see Ass2.2).
Discussion:
The graphs give a visual representation of the seasonal and inter-annual
variability.
For example, if you plot the max and min temperature using the same y-range
(not on the same figure), you can easily see in which months maximum
temperature has the greatest variability, and which months minimum
temperature has the greatest variability.
Download