GVN330 Climate Data Analysis Assignment 2.4: Cumulative

advertisement
GVN330 Climate Data Analysis
Assignment 2.4: Cumulative Frequency Distribution
Cumulative Frequency Distribution (cfd) graphs are conceptually hard to
understand. To correctly interpret cfd graphs, you need to understand quite a lot
of background theory: histograms, quantiles, percentiles, median. But once you
have grasped what a cfd shows, they allow you to determine much more
information about a dataset that you could determine from plotting histograms,
of from fitting a trend line.
I hope this exercise will demonstrate this!
Reading
Wilks 3.3.7 – Cumulative Frequency Distributions (part of reading quiz 3)
For help on understanding cfs, you can try the YouTube video:
http://youtu.be/BfZUlpXQmp0
Note that on his diagram, he has the actual number of people on the y-axis (0 to
120), instead of the fraction of people (0 to 1, which is what we have been using),
but the principles are the same.
Exercises:
Ass2.4 Create seasonal cumulative frequency distribution (cfd) plots for daily mean
temperature for two periods (1756-1856, and 1910-2010) using the Stockholm
time-series. That is, you will compare the first and last 100 years of the record.
a) Download the data, load it into Matlab.
The Stockholm Historical Weather Observations time-series is one of the world's
longest near-continuous instrumental records. From the homepage:
http://dev1.geo.su.se/bbcc/dev/moberg/homogenized_daily_mean_temperatur
es.html
You can download the data as a csv file:
stockholm_daily_mean_temperature_1756_2010.txt
This file contains three versions (see the
README_stockholm_daily_mean_temperature.txt file for details), but the version
we will use is in column 6 of the file.
b) Extract seasonal data for two periods: 1756-1856, and 1910-2010.
That is, you will be plotting cumulative frequency distributions for 8 sets of data:
two periods, and four seasons for each period.
c) Create the plots
To create a cumulative frequency distribution plot, use the matlab function
cdfplot.
Again, I suggest you create all the cfds on the same Figure, but plot each season
on a separate axes, using the subplot command.
Note that you should show the cfds for BOTH periods on the same axis! So, for
example, the axis for the winter season should show the cfd for winter days
1756-1856, and also the cfd for winter days 1910-2010. You decide how to style
them!
You cannot specify plot options to cdfplot directly like you can with plot (why?
no idea…). Instead, you need to collect the handle returned by cdfplot, and use set
to change the properties. I do not expect you to understand handles and set, but
the syntax is:
h=cdfplot(winter1756);
set(h,'Color','r','LineWidth',3)
To plot a second cdf on the same axis, you use then use the hold command, then
call cdfplot again. For example:
hold on
h=cdfplot(winter1910);
set(h,'Color','b','LineWidth',2)
You may also find it useful to set the y-ticks explicitly to ensure that median and
10th and 90th percentiles are shown. This will help you answer the questions in
the discussion. For example:
set(gca,'ytick',[0, 0.1, 0.25, 0.5, 0.75, 0.9, 1]);
grid on
Finally, the code required for this exercise can be greatly simplified by writing a
function that will create and style two cumulative frequency distribution plots on
the same axis given two input vectors!
Discussion points:
By examining your graphs, you should be able to answer the following questions:




For each season, say which of the periods has the highest median.
In fall, in which period are "warm days" (days above the 75th percentile)
warmer? In which period are "cool days" (days below the 25th percentile)
warmer?
Which of the following statements are true for Spring? The biggest
difference between the two distributions is that:
o Cold days (10th percentile) are warmer in the 1756-1856 period.
o Cold days (10th percentile) are warmer in the 1910-2010 period.
o Hot days (90th percentile) are warmer in the 1756-1856 period.
o Cold days (90th percentile) are warmer in the 1910-2010 period.
Which of the following statements is true for every season:
o The minimum value in the 1756-1856 period is greater than the
minimum value in the 1910-2010 period.
o The minimum value in the 1910-2010 period is greater than the
minimum value in the 1756-1856 period.
o The maximum value in the 1756-1856 period is greater than the
maximum value in the 1910-2010 period.
o The maximum value in the 1910-2010 period is greater than the
maximum value in the 1756-1856 period.
Finally, do these plots alone tell us anything about what cause the differences
between the two periods?
CHECKPOINT! Compare your answers with others to check your understanding.
Download