GVN330 Climate Data Analysis Assignment 2.4: Cumulative Frequency Distribution Cumulative Frequency Distribution (cfd) graphs are conceptually hard to understand. To correctly interpret cfd graphs, you need to understand quite a lot of background theory: histograms, quantiles, percentiles, median. But once you have grasped what a cfd shows, they allow you to determine much more information about a dataset that you could determine from plotting histograms, of from fitting a trend line. I hope this exercise will demonstrate this! Reading Wilks 3.3.7 – Cumulative Frequency Distributions (part of reading quiz 3) For help on understanding cfs, you can try the YouTube video: http://youtu.be/BfZUlpXQmp0 Note that on his diagram, he has the actual number of people on the y-axis (0 to 120), instead of the fraction of people (0 to 1, which is what we have been using), but the principles are the same. Exercises: Ass2.4 Create seasonal cumulative frequency distribution (cfd) plots for daily mean temperature for two periods (1756-1856, and 1910-2010) using the Stockholm time-series. That is, you will compare the first and last 100 years of the record. a) Download the data, load it into Matlab. The Stockholm Historical Weather Observations time-series is one of the world's longest near-continuous instrumental records. From the homepage: http://dev1.geo.su.se/bbcc/dev/moberg/homogenized_daily_mean_temperatur es.html You can download the data as a csv file: stockholm_daily_mean_temperature_1756_2010.txt This file contains three versions (see the README_stockholm_daily_mean_temperature.txt file for details), but the version we will use is in column 6 of the file. b) Extract seasonal data for two periods: 1756-1856, and 1910-2010. That is, you will be plotting cumulative frequency distributions for 8 sets of data: two periods, and four seasons for each period. c) Create the plots To create a cumulative frequency distribution plot, use the matlab function cdfplot. Again, I suggest you create all the cfds on the same Figure, but plot each season on a separate axes, using the subplot command. Note that you should show the cfds for BOTH periods on the same axis! So, for example, the axis for the winter season should show the cfd for winter days 1756-1856, and also the cfd for winter days 1910-2010. You decide how to style them! You cannot specify plot options to cdfplot directly like you can with plot (why? no idea…). Instead, you need to collect the handle returned by cdfplot, and use set to change the properties. I do not expect you to understand handles and set, but the syntax is: h=cdfplot(winter1756); set(h,'Color','r','LineWidth',3) To plot a second cdf on the same axis, you use then use the hold command, then call cdfplot again. For example: hold on h=cdfplot(winter1910); set(h,'Color','b','LineWidth',2) You may also find it useful to set the y-ticks explicitly to ensure that median and 10th and 90th percentiles are shown. This will help you answer the questions in the discussion. For example: set(gca,'ytick',[0, 0.1, 0.25, 0.5, 0.75, 0.9, 1]); grid on Finally, the code required for this exercise can be greatly simplified by writing a function that will create and style two cumulative frequency distribution plots on the same axis given two input vectors! Discussion points: By examining your graphs, you should be able to answer the following questions: For each season, say which of the periods has the highest median. In fall, in which period are "warm days" (days above the 75th percentile) warmer? In which period are "cool days" (days below the 25th percentile) warmer? Which of the following statements are true for Spring? The biggest difference between the two distributions is that: o Cold days (10th percentile) are warmer in the 1756-1856 period. o Cold days (10th percentile) are warmer in the 1910-2010 period. o Hot days (90th percentile) are warmer in the 1756-1856 period. o Cold days (90th percentile) are warmer in the 1910-2010 period. Which of the following statements is true for every season: o The minimum value in the 1756-1856 period is greater than the minimum value in the 1910-2010 period. o The minimum value in the 1910-2010 period is greater than the minimum value in the 1756-1856 period. o The maximum value in the 1756-1856 period is greater than the maximum value in the 1910-2010 period. o The maximum value in the 1910-2010 period is greater than the maximum value in the 1756-1856 period. Finally, do these plots alone tell us anything about what cause the differences between the two periods? CHECKPOINT! Compare your answers with others to check your understanding.