GVN330 Climate Data Analysis Assignment 2.1: Numerical Summary statistics In Assignment 1.5/1.6 you created graphs of average temperature and monthly precipitation. These graph show clearly the average seasonal cycle, but they do not indicate the degree of year-to-year variability. In Assignment 1.7 you created graphs showing the monthly time-series. These graphs show the year-to-year variability, but they were difficult to interpret. With the temperature graphs, the seasonal cycle dominates, so it is difficult to see the year-to-year variability (esp in Spring and Fall, it is a bit easier with Summer and Winter). With the precipitation graph, there is so much scatter that it is difficult to even pick out the annual cycle! To try to visualize both the annual cycle and the year-to-year variation, we can calculate and plot numerical summary statistics for each month. The mean is one such summary statistic; you plotted this in Ass1.5/1.6. The advantage of plotting mean monthly values is that – as you discovered in Ass1.5 - you do not need to first pre-calculate averages (or sums) for each month of each year from daily values. But now you know how to calculate mean temperature and sum-of-precipitation for each month of each year (Ass1.7)! so we can create a more sophisticated graph using other summary statistics. Reading Read Chapters 3.1 – 3.3 in Wilks Background Theory Topics: For the exam, you will need to be prepared to discuss the concepts of: Robustness and Resistance Quantiles and Percentiles Concepts of Location (Central Tendency), Spread, and Symmetry and common measure of them: mean, trimmed mean, percentiles, median, standard-deviation, range, inter-quartile range, skewness. Exercises: Assignment2.1 – calculate and then plot numerical summary statistics for each month-of-year, for max temp, min temp, and precipitation, using the monthly data you calculated from Saeve_daily.csv from 1961-2002. The summary statistics are: - the median - upper-and-lower quartiles - maximum and minimum values For example, you have calculated monthly average max temp for Jan 1961, Jan 1962, Jan 1963 …. in Ass1.7. You need to calculate the medians of those January values. And the quartiles, and the max and min values. Then the same for the other months. Then the same for minimum temperature and precipitation! You should make one plot for each variable (ie one plot for max temp, one for min temp, one for precipitation). All the statistical measures you calculate should be on the plot. I SUGGEST THAT YOU READ THROUGH THIS DOCUMENT ENTIRELY BEFORE STARTING! Break the problem up into smaller steps. Suggested steps are: a) Loading data into Matlab In Ass1.7, you calculated monthly averages and sums from daily values. Here you need to calculate statistics on those monthly values. Obviously, you want to reuse the work you did there for this assignment! One way to do this is to re-use the code, coping the lines that load the data and create the averages/sums (not the lines that make the plot) it into a new m-files. This is fine. Another way to do it is to run the code from Ass1.7, and then save the data you calculated as a mat-file (a matlab data file). For example, if during Ass1.7 you calculated a matrix M with columns: M =[year, month, sum precipitation, mean max_temp, mean min_temp]; then you can save this matrix to the mat-file monthly_values.mat using >> save monthly_values.mat M Then in your code for Assignment2.1, you can re-load that data using >> load monthly_values.mat (you may need to copy the file monthly_values.mat to the same directory where your assignment2_1.m code lives, or change the Current Folder) Checkpoint – compare your approach with other groups! Why did you choose your method? b) Choose one variable to begin with, and calculate the statistics The functions you need to use are median, prctile, max and min. In this assignment, you need to make lots of calculations. In particular, you need to repeat calculations for each month-of-year. There are two basic ways to do this. Firstly, you could write a for loop which selects all the eg. Januaries, then calculates the median, quartiles etc on those. You have the knowledge to do this from Ass1.7 Alternatively, you can take advantage of the fact that you can use functions like median on a matrix of values, and matlab calculates the median for each column independently. Thus, if you first create a matrix P of eg monthly precipitation values, where there is a row for each year and a column for each month: P= Jan 1961 Jan 1962 Jan 1963 … Feb 1961 Feb 1962 Feb 1963 Mar 1961 Mar 1962 Mar 1963 … … … Then you can calculate the summary statistics for each month using: >> Pr_median = median(P); >> Pr_Q75 = prctile(P,75); >> Pr_max = max(P); … How might you create the matrix P? You could re-code you work from Ass1.7 to create it from scratch. Or you could use the reshape function, to convert a vector of precipitation timeseries into a Nx12 matrix. You might need to create a 12xN matrix first, then use the transpose operator (') to turn it into Nx12. (see http://www.cyclismo.org/tutorial/matlab/vector.html for an explanation of transpose, but note that you can also transpose a matrix, not just a vector) CHECKPOINT! Compare the median values for a few of the months with other groups! c) Plot the summary statistics. Make one plot for each variable (ie one plot for max temp, one for min temp, one for precipitation). The x-axis should be months (12 values). On each plot, show on the y-axis - the median - upper-and-lower quartiles - maximum and minimum values It is up to you what style you will use to represent the data! But do not spend too much time styling the graphs (see Ass2.2). Discussion: The graphs give a visual representation of the seasonal and inter-annual variability. For example, if you plot the max and min temperature using the same y-range (not on the same figure), you can easily see in which months maximum temperature has the greatest variability, and which months minimum temperature has the greatest variability.