>> Dan Fay: For those of you who were here last year for Jeff's initial talk, this is similar work from Jeff Dozier from UCSB who had done some work on the snowpack but wanted to kind of update us on the latest activities from his visiting portion this year. Thanks Jeff. >> Jeff Dozier: Okay. I've decided I like this title, being an intern. It's sort of good [laughter]. So what we're looking at is the idea of trying to really do snow hydrology in areas where runoff from snow is really important and yet there's not much of a surface infrastructure to really make any sort of measurements. So this is the Hindu Kush range of Afghanistan. We're looking at an area that's eight times the size of the state of Washington, so we're trying to run these models over pretty large areas. Okay. And the problem that we face, or that the people there face, is illustrated by this advisory that came out from the UN's information on, Institute For Regional Information Networks that sort of monitors conditions all over the world and tries to alert the international community if something is troubling. And you can see this fairly sort of desperate warning and the problem is if you look at the date on it, it came in in September after the harvest had failed. So the question is could you have done a better job, and even looking at passive microwave data we see that this year in this basin just in terms of total amount of snow it was a pretty low number and so the ideas that we could have given that warning in April rather than in September and therefore could have better organized a response to it. In fact, it was ironic that the following winter, in 2012, was a fairly big snow year and this caused the problem of not being able to get food supplies into places where people were starving. One year of not enough snow and then the next year of too much snow and the combination of those led to a lot of starvation. The problem with this, though, is one of the questions is is do passive microwaves in fact give you a reasonable signal of the snowpack in the mountain environment. We've done some of this work in the Sierra where we've got measurements at the surface to compare with. >>: Can you back up? How do you do these passive microwave measurements? >> Jeff Dozier: Oh. What a passive microwave signal does is… >>: Is it a plane flying over? >> Jeff Dozier: No. It's a satellite. And its 25 kilometer pixels or so. The reason is that the emission from Earth's surface at those long wavelengths is very small, so you are not getting many photons to count, so the only way to do it is you got to open up the -- you can't get a very good resolution. The principle upon which it operates is in the microwave part of the spectrum, snow is not very, ice is not very absorbed and so what happens is, but it does scattered radiation, so you get radiation as being emitted from the soil and then it's being scattered by the snowpack above it and scattering, of course, causes extinction and that's why on a cloudy day there's less sunlight under the clouds. So what happens is that by looking at the emission from the longer wavelengths where you are seeing through the snowpack and then looking at the shorter wavelengths you're actually trying to estimate how much attenuation is coming from the snowpack and therefore the snow water equivalent. The problem is if you compare this with a method called reconstruction that we've worked a lot on, you can see that it's an order of magnitude less if you look at the numbers on the y-axis. In the mountains at least the passive microwave estimates are only seeing about 10 percent of the total volume and there are some physical reasons for why. Therefore, we've tended to focus on this idea of reconstruction, but you can see this problem. This is a time series map of the passive microwaves. I'd like to make it go just a little faster. Okay. This is a daily map and we know snow doesn't really behave this way. It's flickering on and off and so one of the questions is well, are there ways that we can still use it. And in order to figure that out we have to have ways of estimating the spatial distribution of snow. The way I do this is I've got the passive microwave data over here. The nice thing about them is that they are timely. You get the estimate of the snowpack right away, but a lot of uncertainty and very coarse resolution. From Modis, or other satellites, but from Modis I can get estimates, daily estimates of snow cover and the reflectivity of the snow. Then from the global land data assimilation system, we can estimate solar radiation and longwave radiation and so forth, and therefore we can put that all together and we can model the snow melt day by day. We can't tell with this how much snow there is, because in this part of the spectrum we don't see through the snowpack; we just see the surface. On the other hand, if you can model the melt and if you can tell when it disappears, then you can back up that calculation and figure out how much there was on a previous day. So the idea is to do that, we want to get them an estimate of the snow water equivalent trying to figure out if we can correct for the passive microwave data. Then, you know, the way that this will be used in an operational sense is we could put that year into the historical context. Automatically, we can sort of say well is there reason for concern or not, and if the answer is yes, then the Army takes that information and, you know, looks at what happened in previous years and then issues warnings. Now what I worked done for this summer is a computationally intensive part of the problem, figuring out how to use cloud computing to help with this. The issue is I actually need to have a daily value of the snowcovered area, and along with that I get an estimate of the grain size and the albedo. How do we do this? We start with, we kind of try to go to basic physics. This is a graph of the optical properties of ice and water and this is the index of refraction, the kind of thing that you learned about in high school about how light bends when it goes through a substance. This is the absorption coefficient and I've actually got a slide that kind of explains what these are. This is what the, the definition of the refractive index, how the light bends as it goes through a material. This is the definition of the adsorption coefficient, that is that you get a decay as you're passing the light through a pure substance and you normalize it by the wavelength so that the absorption coefficient can be dimensionless. Then if you solve that differential equation, you get a… >>: I found you. >> Jeff Dozier: Hello Tony [laughter]. So we were just defining the absorption coefficient. So we get an exponential decay. Now, in order to kind of explain what a number of an absorption coefficient might mean, I simply can take this exponent and solve for the distance at which this number is going to be -1 and so we can call that an E folding distance for light as it's passing through snow, or, excuse me, or through pure ice. Let me back up and do that. And so what we see is if we look at the E folding distance for ice that it varies by seven orders of magnitude over the distance of the solar spectrum. In the visible part of the spectrum that number is tens of meters, so when you go diving in Hawaii and you're under the water, you can see a long way. And similarly if you were frozen in bubble free ice you'd also be able to see a long way. Whereas, when you get out to the longer wavelengths, that number, you know, gets down to less than a millimeter. The consequence of that is we can then put those kinds of numbers into a calculation of the scattering properties of an individual snow grain. This used to be a really interesting and difficult computational problem. That is Gustav Mie published his equations in 1908, the first really fast and useful solution to those equations by computer was published in 1980, 70 years after the equations themselves appeared. But I don't have to worry about that. That's been done now and been done really well. What you can do then is take those properties for scattering from a single grain and you can do a multiple scattering solution for what's going on in the snowpack. What you end up with is something that is intuitively pretty obvious. What this graph shows is the reflectivity of snow, of deep snow for the wavelengths of the solar spectrum for a variety of grain sizes, going from very fine grains to coarse grains. >>: Is .05 deep powder? >> Jeff Dozier: Pardon? >>: .05 [indiscernible] >> Jeff Dozier: .05 is pretty small, yeah. That's about as small, maybe .03 is about as small as snow gets. On the right-hand y-axis I've got that absorption coefficient plotted and what you see is where the absorption is low the reflectivity of snow is really high and it's not very sensitive to the grain size. When snow is across the visible spectrum, snow white, right? And it's white -- but if you look at an individual snow grain, it's not white; it's transparent. But the multiple scattering makes the reflection white. If a child asks you where does the white go when the snow melts, you've got a way to answer it. >>: It turns black in my driveway [laughter]. I've got an actual question. Is there an intuitive reason why certain wavelengths have the high end of wavelength, weird spikes. What does that mean? >> Jeff Dozier: Those are rotational and vibrational moments in the quantum mechanics of the absorption of ice. >>: The rotational states of the water molecule. >> Jeff Dozier: Or of the ice molecule. Ice and water are only shifted a little bit in this part of the spectrum, whereas, without the microwave ice and water are really different. What you see here is a couple of things. One is as you get out to the region where the absorption is moderate, then the grain size makes a difference, so what happens as a result of that is as the snow ages and the grains grow, it becomes less reflective. Remember, that about half of the sun's energy is out beyond the wavelengths of the visible. And then out here snow is pretty dark. That helps us distinguish snow from clouds, because clouds have little particles. That's why they're still up in the sky. I mean that's really the difference between an ice cloud and the snowpack is the snowpack is a cloud that got, where the particles got big enough that they fell out of the sky and landed on the ground. What this means is that if you compare snow to the other things that occur on Earth's surface, here's vegetation. Here's different kinds of soil, first of all, there's a lot of variability in snow. If you go out beyond the visible snow is one of the most colorful substances in nature, but you also see that if you compare it to the wavelength bands of Modis that are here that it really is distinctive. It allows us to distinguish snow from the other elements. And out here we can distinguish it from clouds. In other words, snow is about the only thing that is really bright in the visible part of the spectrum and really dark in what we would call the shortwave infrared and sensitive to grain size in the middle between those. What that allows us to do then is if we have satellites that have this sort of spectral information, we can distinguish snow from other substances and this is with Landsat and this is very nice. This is at 30 meter resolution, but it's got a 16 day repeat pass because the swath is only 185 kilometers and so therefore you miss opportunities. A lot can happen in 16 days and if that day happens to be cloud covered and now you are 32 days between acquisitions, so we'd like to do something a little better. This use of the shortwave infrared part of the spectrum allows us to distinguish clouds, so here's the visible bands and you can see the clouds and the snow are a little hard to tell apart. But there's the, if you use the bands out in the further end of the spectrum, you can see that the clouds are pretty distinctive. >>: The clouds are the blue? >> Jeff Dozier: No. The clouds are [laughter] >>: [indiscernible] >>: Is that Eastern Sierra? >> Jeff Dozier: Yeah, that's Mono Lake. What Modis has is it's got very similar bands to what's on Landsat but it's got a swath of 2300 kilometers and so this is a Modis tile that is 1200 x 1200 kilometers, so that's, you know, 1.44 million square kilometers. The state of Washington is 180,000 square kilometers, so this is eight times the size of the state of Washington. This one we get daily coverage, but the spatial resolution is 500 meters and so what we're able to do is by this, with this spectrum we can at least compensate for this courser spatial resolution by estimating a fractional cover of snow within each pixel. What this does is we can calculate what are called end members, so in this case this is the green is the concentration of vegetation. The red is the concentration of soil and the blue is the concentration of snow and so we can solve for each. That's also a pretty computationally intensive problem, but it can be done as a parameter sweep. Each pixel's computation is separate from the neighbors, so it's a nice application for trying to put onto a cloud. It works pretty well. This is comparing what we get with at 500 meters with what we get from 30 meters and this is the scatter diagram. There are some things that I want to do with this part of the process but that's not really what I focused on this summer. The problem is I do want to get a measurement every day, and sometimes what you see is you get clouds. What I'm thinking here, this is now a data cube, so this is, the plane is the spatial dimension, and it's in a projection so these are kilometers north and south on the left and east and west and then date on that axis. The red color shows the absence of data, so I picked it. I'm not sure it was a great choice, but on the other hand, there are holes in my data so it's bleeding [laughter]. And so we can look through the year, or in this case just a 32 day period, and so my first step is in trying to get to the daily data is to use a threedimensional Laplacian to try to fill in those holes. But I don't mess with, I don't change any of the observations themselves. You can see that we get something where the holes are all filled in, but where it still looks pretty messy. We need to have, and this is the thing that makes it kind of a harder computational problem is that these really are three-dimensional data where there's lots of neighborhood effects because sometimes we want to slice the data this way. Sometimes we want to drill down through the column and so in trying to sort of fix this, to make it smoother, we want to be able to use some knowledge about what we have. There's a couple of different kinds of glitches in the data. This is a pretty clear day, so what we end up with though, is we have both low frequency dropouts caused by the clouds, but we also have high frequency because some of the, one of the Modis bands is starting to go bad a little bit. It's got some periodic noise in it indicated by these little red dots. And if we look at it in, if we zoom in on some of these areas we can see, again, both this low frequency noise that we can identify, but also some high frequency noise. Not only that, we see this break in this image right in the middle and the reason for that is in this case this image was stitched together from two different orbits. Part of the cause of this variability is the fact that we're getting this wide swath, you know, of more than 2000 kilometers from only a 700 kilometer orbit and so that means you've got to be looking at things that are pretty high viewing angle. So this is a map of that images and it's what we would call the sensor viewing angle, so that's the angle up to the satellite if you were standing on the surface. If it were a plane parallel system, that would be the same as the nadir angle from the sensor, but because the earth is curved, those two numbers are different. Now the problem is what happens as your -- so where it's blue it means that that place was right underneath us at this time of the orbit. Where it's red, you know, we're up to 60 degrees or so off nadir, off the zenith where we're looking at it and this thing in the middle is where the two orbits were stitched together. What that means is that at the edge of the swath, so the pixel right underneath the satellite is a half a kilometer square. The pixel at the edge of the swath is about 5x1 kilometer, so it's 10 times the area at the edge of the swath. That's part of the problem that is introducing some of this noise into the images that on different days you are actually looking at a different piece of real estate on the ground and you want to try to put together a -- how do you put a picture together? I think what this is -- it's a class of smoothing problems that, where I have more confidence in some of the data than I do in other measurements, and so I want to adapt a smoothing method that in fact takes advantage of the fact that I have a physical reason for having more confidence in some points than in others. What I do is I actually use a smoothing spline, but I weight the smoothing spline inversely to that viewing angle. Were you raising your hand? >>: Yes. Do you have to deal with [indiscernible] air column and then more moisture [indiscernible] >> Jeff Dozier: We actually start with an atmospheric corrected value, yeah. I guess the point is that if I have a bunch of clear days, then the off nadir shots don't contribute very much to the signal. They pretty much get ignored in the smoothing algorithm, but on the other hand, if that's the only view I have in a two-week period then I'll use it. That's what results. >>: This is after smoothing? >> Jeff Dozier: What? >>: This is after smoothing? >> Jeff Dozier: This is each [indiscernible]. This is the first time I've seen this [laughter]. So what it is is it's that same cube that we looked at before and it showing every day from over a 32 day period. >>: As a result of the smoothing you did? >> Jeff Dozier: As a result of the smoothing and I think it's pretty good. >>: Yeah, it actually looks much better than the other cube. >> Jeff Dozier: Yeah. [laughter]. That's the idea, yeah. >>: Yeah, definitely. You can see the, where before there was [indiscernible] >> Jeff Dozier: Yeah. Okay. So it works really well and I'm really pleased to have done this. The only problem is this is just 800 pixels by 800 and it's only a 32 day slice, so this is a ninth of a Modis image and a 12th of a year and this took about two hours. >>: Computationally or it's a clock time? >> Jeff Dozier: Yeah. >>: Wall time the Azure process [indiscernible] >> Jeff Dozier: Actually I ran this just on one node, but I know now how to break this up and that's [indiscernible] and we're going to do that. I got another week. In other words, what I'm going to do is use, so I do the Laplacian smoothing over the full tile, you know, of 2400 x 2400, and then what I'm going to do is divide that into nine parts and then take the whole year column and do the smoothing on the whole year. And that's a way of getting by the, of actually taking advantage of multiple things. And then the way that the reconstruction works is once I have that then I can run a snowmelt model and the way I do that is this is illustrated with the measurements from a snow pillow that if you know, if you don't have a snow pillow, but if you know what day the snow goes away and you can calculate the rate of melt, then you can back up and figure out how much there would have been. And so this gives us a couple of things. It gives us a spatially distributed estimate of how much snow there was back to about the peak of the snow cover. And so that allows us then to compare with passive microwave data. It also allows us to compare with models, because one of the problems especially in precipitation models is you've got a grid. You are modeling at some spacing on the grid of 10 kilometers or 150 kilometers or something like that and how do you compare that to a measurement? What is it that you compare to? Now we've got something that we can use to compare. The way this works, we compared in the Sierra where we've got some surface measurements with measurements at snow courses which are the ones done monthly by people skiing through the mountains and poking a tube in the snow and weighing it, and then also with snow pillows which are an automatic measurement. The good thing is, obviously there is some error in that, but the error is centered around 0, so there doesn't appear to be a bias. And part of the error is that the snow pillow is only representing a point within a half a kilometer pixel, so the snow pillow is not a perfect measurement either. And then if we compare the inputs from -- we estimate the incoming solar radiation pretty well. Air temperature we do pretty well. A little bit of a bias in the incoming longwave radiation. That could also be a measurement problem. That's a difficult thing to measure at the surface and in the Sierra Nevada there are really only three long-term stations that do it. The reason it's difficult is that your measuring the same thing, your instrument is emitting the same thing you are trying to measure [laughter] and so the temperature compensation has proved to be hard with those. Okay. So is this reconstruction giving us a good answer, because we have other methods at least in well instrumented places of getting alternatives? One is if we have a lot of surface measurements we can just do a spatial interpolation, or if we have a lot of surface measurements we can do a data simulation model. And the reconstruction is, in fact, showing greater amounts of snow than any of those. So are we right, or are they right? Here's our estimate that shows the reconstruction is right. What we've done is to use the stream flow in these basins, do a calculation of evapotranspiration and then how much change in storage it is. In other words taking the hydrologic balance equation and then estimating the precipitation from that, backing it out, and both the interpolation method and the assimilation are giving you some negative numbers, and negative precipitation can't happen. >>: [indiscernible] are actually [indiscernible] >> Jeff Dozier: Q is discharge in the river. E is evapotranspiration, and Delta S is the change in storage from groundwater. Usually over long timescales that's going to be small. It's a way of backing out the precipitation estimate and you hope that when you do that your precipitation estimate ends up being positive. In the case of the reconstruction for 12 years, 19 drainage basins our numbers, the numbers in the reconstruction are all positive, whereas, for some of the others the reconstruction is showing negative values for some. And then we've got other -and so the way that we… >>: Time is running backwards. >> Jeff Dozier: Pardon? >>: Time is running backwards. I'm joking. [laughter] >> Jeff Dozier: Okay. So the way we do this, again this is another computationally intensive problem, but this is, we can do day by day, so it's a pretty easy thing to move onto Azure. We take the short wave radiation estimate from either the national or the global land data assimilation system. The upper left shows the resolution at which those data come in, which in this case are an eighth for the degree. We smooth those to the size of the Modis pixel and then we scale it based on the topography. Because the problem is in that eighth degree pixel there's a lot of topographic variation and so we scale it just doing a pressure scaling relationship. Then we bring in the slope and exposure, and then we calculate for attenuation by vegetation. We've got then a map of the albedo. We then estimate how much solar radiation is being reflected back upward and what we end up with is a net. That's what energy that goes in the bell. And then we do something similar with the longwave radiation but I'll, in the interest of time I'll skip that detail. Again, that takes a lot of computing, but that's pretty easy to parallelize because each day is independent of the other days. If we do the same thing for the Hindu Kush, in this case this is just showing the data for a day, we can get over a mountain range that's very large. Some of these drainage basins, the Amu Darya itself is slightly larger than the state of Washington. It's 200,000 square kilometers, so we can get each of these inputs to the model over every pixel and then we can calculate how much melt is coming from the radiation, what's coming from the sensible and latent heat flux, which is a function of temperature and humidity, and then we can get all the melt for a particular day and then we can do it for every day for every year. So this is showing the variability that we've seen in the years [laughter]. 2008 is missing. This is my -- my colleague Karl Richter was running this on the Linux cluster at Santa Barbara and he -- there's a difference in Linux between the RM command and the MV command [laughter] and he typed RM instead of MV after he had done all of these calculations for 2008. On the other hand, it made it an easier slide with, only having to put four years in [laughter] instead of five. Okay. That's it. >>: That's the Hindu Kush? >> Jeff Dozier: That's the Hindu Kush. That's the Wakhan Corridor up on the right. This is a sinusoidal projection that is, so it… >>: [indiscernible] >> Jeff Dozier: Okay. I've learned how to use Azure sort of with [indiscernible]'s help and I think I learned, I figured out how to deal with the hardest parallel part of the problem, which is the fact that we're dealing with something that -- but it's a general problem of dealing with three dimensional data where you want to sometimes slice this way and sometimes you want to slice that way. And then the rest of it is I think easier to run in parallel because once we do that then we can run day by day. >>: So the problems of mounting a disk multiple [indiscernible] >> Jeff Dozier: You can't [laughter] >>: That's not the problem then. >> Jeff Dozier: Well. I mean, now I'm wandering into territory where most of you know more than I do. I guess the issue is that in the blob store really the, what you can do is get input, that you can't reach into the blob store and read part of a file. You actually have to either -- I don't know. I don't want to say -- it's too bad because I'm storing these results as HDF 5 files which supports block compression so that you can read a piece of it even though it's a compressed file, but you got to get it out of the store in order to read it. So I think the alternative is to take those chunks, that is to turn every image into a 9 x 9, or a 3 x 3, so turn every image into eight images and then I can parcel those out to individual machines. >>: This was all the calculations you needed to do to count that snow cover of this particular region for four years and what do I learn? >> Jeff Dozier: Oh. Okay. So what you now have is, let me go back to this. Sorry about that. I think it was right at the beginning here. Yeah. >>: I sort of missed this one. >> Jeff Dozier: So what I've got is I've got methods in which I can estimate the snow cover while, during the year in real-time, because this reconstruction you only get, you get the answer but you only get it at the end. On the other hand, what we show is it's a really good answer. What we now have is something that we can use to compare with say estimates from passive microwave and with NOAA is developing a Central Asian snow accumulation model, but they have no way of validating it. So this gives us a method of validation for that kind of a model, and so they -- in fact, I'm meeting with them toward the end of this month because they are really, they keep asking us for the reconstruction results for the past decade. Then if we can figure out how to help with the passive microwave data, as I showed before you came in, that only sees about 10 percent of the snow. But if we can figure out how to correct it, then that geophysical time series goes back to 1978 and so we can then do a better job of sort of putting any current condition into the historical bracket as part of the historical narrative. A lot of what we can -- I mean, in general, management of water works pretty well when you are kind of near the median [laughter], you know, and so part of the idea with this is can you identify the years that are at the two tails of the distribution. Is this… >>: Do those four years you show the Hindu Kush look more or less the same? >> Jeff Dozier: There was in the Kabul part of the watershed there was flooding in 2007. >>: I don't know where Kabul is on that map. >> Jeff Dozier: It's, it would have been the part that's draining to the south and that does show in 2007. By and large one of the things that really helps in sort of management especially in places where simply giving a volume in cubic kilometers or acre-feet or any sort of unit thing isn't going to mean much, but if you can put things in historical context, you know, if you can say that this year is say comparable 2007 when there was flooding, or comparable 2011 when there was drought, then even the villagers will remember what things were like in those conditions and therefore, you know, knew what got flooded then or, in the case of drought, how badly the crops did then. >>: And then mobilize resources early on. >> Jeff Dozier: And then you can mobilize, yeah. Again, what happened in 2011 as there was a big drought but they… >>: Can you show those the final answer again? >> Jeff Dozier: It's probably easier to… >>: So 2007 was a flood? >> Jeff Dozier: 2007 in the southern part here was a flood, yeah. >>: And the other, I don't see much difference. >> Jeff Dozier: That's true. The interesting thing about that year and about 2011 is the snowcovered area was pretty similar even though the depth of the snow was different. >>: An interesting part is that it's a piece of layering kind of with the basin and where the water sort of goes out of and then also the populations [indiscernible] >>: That you can use [indiscernible] >>: [indiscernible]