>> Dan Fay: I want to welcome you to this talk by Jeff Dozier from University of California, Santa Barbara. Jeff is up here visiting us for a few months to look at some of the interactions with how you can use some of the technology with some of the stuff we’re doing here at Microsoft research. And so Jeff has been involved with a lot of the different activities we’ve been doing for many years. Everything from Rolleon [phonetic], Terror server that we were doing with Jim Grey and some of the other folks, and also down to some of the other activities at the assignments workshop, and is actually a Microsoft e-science fellow for us. So I want to welcome Jeff and he’s going to give a talk on snow hydrology at the scale of mountain ranges. So welcome, Jeff. >> Jeff Dozier: Thanks. Great. [applause] So as you can see I’m spending my summer at MSR. I think of myself as the oldest intern in the group. [laughter] And I want to acknowledge two students who have helped a little bit with this. Carl Ricker is a UCSB student who just finished his Ph. D. in early July and is now working at a jet propulsion lab. And a lot of the presentation is based on stuff that he did. And then, of course, my local intern Alex Wiggins who is a student at Oregon State. And he’s working on some of the visualization aspects of this. And he’s preparing something to show you after I’m done. He works much closer to deadlines even than I do. Okay. So the issue with snow that you would know in any part of the western United States, and affecting probably a billion people worldwide who get water from snowmelt in the mountains, is that if you look at the record from a snow pillow, which is a device on the ground that measures the weight of the snow on top of it, then the typical season is that that snow starts accumulating early in the season sometime, October or November, and then it fits and starts. The snow pack grows, and then some time in the spring it starts to melt. And there may be a few snowfall events during that melt, but, you know, by and large it melts pretty quickly. And it, of course, happens at the time that the water demand is increasing. So the snow pack really is the natural reservoir over much of the mountain west. And these graphs here show the same thing. What I’ve done is to take the snow pillow record on the right, take the daily differences, and express those as either positive or negative. And so again what you see is that the accumulation period is the set of storms that occurs over periods of six months or so, and then the melt period occurs pretty quickly. The other thing that’s kind of interesting about this graph if you look at a relatively lean year, 2007, with a relatively big year in the Sierras, 2011, is that the sizes of the storms are not that much different. The difference in a big year is you get more storms occurring. Okay. And because of the importance in this snow resource for agriculture and urban use and hydropower and so forth, there’s a forecasting effort that goes on. And typically what water agencies do is around the first of April or so they issue a forecast for the seasonal runoff. That is what they expect to get from April to July, you know, based on all these instruments that are in the mountains. And what this graph here shows is that that relative forecast error is pretty high. That is if you look at the median error, the median error is about 20 percent or so. And if you go out to the rarer events, say if you go up to the 90 percentile on the accumulator probability distribution, that error is nearly 60 percent. So what that says is, you know, in one year in ten that forecast error is going to be substantial. And the reason in particular in the Sierra Nevada where that’s important is that this lower graph shows the reservoir capacity as a fraction of the mean annual runoff. And what you see is most of the basins are riding right around 100 percent or a little bit less. There are few exceptions. Well the Truckee, they’ve got Lake Tahoe, so there’s about a three meter dam height at the Truckee River spill [inaudible]. So if you multiply three meters times the area of Lake Tahoe it’s a pretty big number. And then the Malones Dam in the Stanislaus is big. But other than, you know, we don’t store much more than the annual runoff throughout that mountain range. And so therefore forecast errors in a single season make a big difference. You know, whereas if you had say five years of storage, then those errors wouldn’t be so important. Okay. So there are several reasons for why forecasting is difficult. And one of them is that by and large the snow sensors don’t cover the highest elevations. Partly just that you get higher, it’s a bleak, windy environment, and it’s harder to install things. But what you see on the left for a two-week period in 2011 is that in July that all of the snow sensors are below the snow zone, and yet there is a fair amount of snow that’s still in the watershed. So in other words, at this point if you look these are the snow pillow records for all of them. And by that first image up on the top, all of them but two have gone to zero, and by the second image they’ve all gone to zero. So there’s still snow in the watershed and yet all of your surface network says the snow is gone. And where this is important is the reservoir manager for the Hetch Hetchy reservoir that supplies water to San Francisco. He’s kind of wondering what’s going on at this point because he can let some of that water out and generate hydropower and make about $125,000 a day, but then he runs the risk of you’re not being able to take a shower when you visit San Francisco in late October. And so if he knew that there was more snow in the watershed then he could in fact make a better decision, whereas thinking that there’s no snow in the watershed means he has to kind of hang on to what he’s got. Okay. The other reasons for heterogeneity are, you know, the big difference between snow and rain, aside from the fact that one of them is liquid and the other is not, is that snow moves around on the ground after it falls. And this picture on the left is at the Reynolds Creek watershed in Idaho. You can see the fence in the background, that snow in the foreground is probably up to your shins, and yet back there where those trees are the snow is three meters deep. And that’s pretty important. That’s why those trees have decided to grow there, right? And then similarly there’s a differential melting that occurs. So this is a typical scene that you see in the mountain west with one slope covered with snow and the slope opposite it bare. So there’s a couple of things that we do in order to try to characterize this heterogeneity and what’s going on in the snow pack at very large scales. That is we want to look at the whole mountain range. So one of the things we do is because that snow is heterogeneous, and we want to see it pretty much every day, or we want to get it enough opportunities so we can really keep track of what’s going on with it, we need to use a satellite sensor that has a high revisit interval. And so we use motives because we have data every day. But the problem is that pixel is 500 meters in size, and so there’s a lot of heterogeneity that’s occurring within that pixel. And so what we do is we have a method to actually map what we call the fractional snow cover. And this is going to be a pretty typical problem any time where you want a rapid revisit interval. Therefore that drives you to a somewhat larger pixel and therefore you’re going to have this mixing problem. And so in this case we figured out a way to solve that. So we can actually say how much of that pixel is in fact snow-covered. And then what we do, and this is -- first of all, it’s okay to ask questions during the talk. You can interrupt me. But the consequence of that is sometimes when people do that you don’t get to the end, right? You get into discussions. And so what I’m going to do is, this is the end, but then I’m going to work backwards, okay? So the end is that we really want to know how much snow is in the mountains. And we want to know its special distribution. And we want to know that for a couple reasons. One is, you know, the help with the forecasting problem, but the other is to sort of look at the impact that snow has on the hydro ecology of these sorts of mountain ranges that this special variability is really important. So there is no method currently to directly measure this from remote sensing. Right? And I can discuss efforts on how to do that directly, but that’s another lecture entirely. So what we do instead is we can track when the snow, the rate at which the snow disappears and from that we can reconstruct what that spatial distribution was. And so, and that’s what it looks like in this wet year, 2011, it was wet in California. It wasn’t wet up here because it was wet in California, right? 2007 it was dry in California, so you know, Oregon and Washington got a little bit more in that year. But what you can see is this is what that spatial distribution looks like. And these show the three months starting April 1st and on the right June 1st. And what you see in the big years there’s still a fair amount of snow left in June, right? Okay. So how do we do that? And what are some of the e-science problems involved? >>: One quick question [inaudible]. How much is [inaudible] a factor in terms of, you know, ice crystals turns to water. You know, clearly the amount of time it takes for that to get to a point where it needs to be managed in a reservoir is going to be different depending on where it came from. Is that -- I mean, what sort of time range are we talking about? Hours? Days? Or weeks? >> Jeff Dozier: It’s days. >>: It’s days. Okay. >> Jeff Dozier: It’s days, yeah. Typically these reservoirs don’t have a lot of ground water storage. They’re up in the mountains. They’re mostly bedrock. The answer to that question varies a little bit because part of that residence time is in the snow pack. >>: Right. Yes. >> Jeff Dozier: So you’re melting your water at the top of the snow pack, then it has to get through the snow. And then it gets through the soil. But yeah, it’s typically days to a week or two. It’s not, whereas in a mountain range with deep ground water storage then it could be longer. >>: So you could have like a super hot day all of a sudden out of the blue ->> Jeff Dozier: Oh, well, you see ->>: You know when it’s going to show up. I think of this from, in particular say, like, the Grand Coulee Dam, where the water is showing up in the river and then the challenge that we have up here is that they’re trying to manage the flow through the reservoir so they don’t risk too much coming in topping the dam. And also you’ve got this challenge with wind power. They’re trying to figure out whether or not to curtail wind because they need to push water through the dam because there’s trouble of [inaudible]. >> Jeff Dozier: Yeah. Well, this is one of the problems in that if you’re, in your reservoir management has multiple purposes. One of them is flood control and another is provisional water resources. And if you think, those are diametrically opposed to one another. But you can sort of see what the timing is simply based on the fact that when you’re in the mountains you do see a diurnal variability in the stream flow. And typically that is on the order of a few hours after the maximum rate of melt. So that kind of shows you that there isn’t a lot of lag time going on into storage, right? >>: [inaudible]. Does sediment transport play into this whole analysis of how management goes, or is that not so much an issue? >> Jeff Dozier: Well, that’s an interesting problem in that the one thing that does occur with sediment transport is that there is this predictable sort of diurnal variability, predictable and seasonal. It not only applies to sediment transport but it applies to where the fish are trying to hide while they’re hatching and stuff. In other words, during periods when flow is too low sometimes the water gets too hot and it affects fishes, and so sustaining -- So there’s lots of things going on in the way that this heterogeneity, both spatially and temporally, not only affects the way we manage water, but it also affects the sort of habitats that the water supports. Okay. So how do we do this? Well, you know in e-science you got to have a workflow diagram. I usually write scripts, but what this is showing, first of all, you know, trying to impress you with its complexity I guess. But the stuff that comes in blue is all data that we get from somewhere else. So the MODIS data and then VIIRS is a sensor that just went up that we’re expecting to use, digital elevation models, land cover data, and then the assimilated data from the national land data assimilation system. And then everything else in red is sort of stuff that we do. And so I can take you through some of that. So the first is, you know, how we actually go through this processing to produce the fractional snow cover estimates. And what we do here is we take advantage of the position of the MODIS bands, which the MODIS has 36 spectral bands. Some of them are at 500 meters, two of them are at 250 meters, and then the rest are at one kilometer. But the seven bands in this part of the spectrum are usually what are called the Land bands. And if we look at how that matches the spectral reflectance of the major components of what’s on the earth’s surface. Well, what’s on the earth’s surface? There’s vegetation, there’s dirt, and there’s snow, right? Water, which is pretty dark in all spectral bands, and then you know, everything else. >>: Are rocks different than dirt? >> Jeff Dozier: It’s soil. >>: It’s soil? Rocks aren’t storing water but soil is. >> Jeff Dozier: Soil has variability in the reflections because it does get wet, yeah. So generally when it’s wet it’s darker. Yeah. So that’s why you see this if you look at different reflectance of soil, they all sort of have the same shape, but there is some of them are darker and some of them are lighter. Vegetation has this typical reflectance of being pretty dark in the visible and then rising very steeply as you go into the near infrared part of the spectrum that sometimes is called the red edge of the spectrum. And then snow, which has a lot of variability because of the grain size. So, but notice that the visible spectrum is only going out to point seven micrometers, and so much of that variability in snow is out beyond the wavelengths that you can see. But in fact if you think of color in this sort of extended way, including the whole spectrum, snow is one of the most colorful substances in nature, you know? >>: So it’s probably like the Northern Lights, but it’s all in ->> Jeff Dozier: It’s all beyond the wavelengths you can see, but if, you know, if our eyes were sensitive out of the near infrared there would be a tourist industry in the western United States. Kind of like the leaf peepers in New England, you know, going out to watch the colors. But the main thing, the point of this, is if you compare all those spectra, and of course to do that I just selected one from each graph that you see, that snow is pretty distinctive from soil, that it’s pretty distinctive from vegetation. So those three substances tend to have distinctive spectral signatures. And the MODIS bands are positioned well enough to catch those. And so what that means is that if you have a spectral signature from a pixel, then you can decompose it to what fractions of stuff went into it. And so that’s where we get this fractional snow cover. So this is the Sierra. Again, in a dry year on top and in a wet year on the bottom. And this little area that’s blown up at the bottom these are the Tuolumne and the Merced drainage spaces, right? >>: Are you computing that or is MODIS? >> Jeff Dozier: No. Remember the red diagram. The stuff in red is us. Yeah. No. Well, MODIS just makes the measurement and then we get the data and then we make the inference. Okay. And then so that was all the stuff on the left. And then what we do is it turns out after we do that measurement there are some things we need to do to sort of smooth out the data because there are gaps form cloud cover. And also how does MODIS get these daily data when its orbit altitude is only 705 kilometers? And in order to cover the earth every day that means it has to have a pretty wide swath. And in order to have a pretty wide swath it’s got to have, some of what you’re seeing is pretty steep or pretty off nadir view angles. So the combination of the clouds, occasional dropouts, and these off nadir view angles mean that your data are scrambled a little bit. So these are just two days apart. This is April 8th and this is April 9th. And this was on a day when MODIS happened to see this part of the world at a pretty near nadir view angle. And here’s the histogram of the view angles for that scene. So it was right overhead. And the next day, in order to even get that second day, it was way off to the side, and in fact, notice that the view angles the histograms are bimodal. And the reason is that this image was actually stitched together from one pass looking in this direction and one pass looking at this direction. And what you notice about these pixels. Well, what happens when you’re looking off to the side? Well, firstly that pixel expands in both directions because it’s just further away, right? And the other is that it expands more in the cross track direction because of the effect of the earth curvature. So what you see is that it’s the same scene but these pixels here look a lot different than the one on the left. So we could just ignore them, right, on these days? But what if that’s the only scene I’ve got during a period when it’s cloud covered. So what we try to do is we -- I run a smoothing spline [phonetic] through the data, but I weight the splines [phonetic] by the inverse of that view angle. So that if my only view in a period that’s otherwise cloud covered is off to the side, I’m going to use it. But in the cases where I have data every day, then it doesn’t affect our analysis very much. Okay, so then off to this part of the flow diagram. So what we’ve got is we’ve got data on the snow cover and now we can tell kind of the rate at which that fractional snow cover is going away during the year. And so now we want to look at how much energy is actually involved. So one of the things that the MODIS analysis also gives us is we estimate the albedo. So we know how bright that snow is, and therefore how much solar radiation it’s reflecting, and therefore from the difference how much it’s absorbing. And then what we get from the NLDAS system is sort of gridded meteorological reanalysis. And the way this works is the spatial resolution of NLDAS is 1/8th degree. So it’s a lot bigger than our pixel, and that’s what you see for solar radiation in the upper left hand corner. And this is just for one hour, right? So then what we do in the second diagram is we smooth that solar radiation to the resolution that we’re going to run the model at, which in this case is 100 meters. And the problem now is that 1/8th degree grid cell has a single elevation that is assigned to it. So what we then need to do in the third part there is we scale that radiation within that 1/8th degree pixel and we spread it over the 100-meter pixels. We scale it accounting for the elevation. And we just do a pressure scaling, right? And then we’ve got topography. That is some slopes are facing the sun and some slopes are facing away from the sun. And so then we lay that scaled radiation now down on the topography and then we deal with the fact that some of that snow is under the trees. So we apply a land cover map to this and figure out how much radiation is actually hitting the snow. And then we’ve got the albedo of that snow. You can see the scale there. And then if you multiply all that you get the amount of radiation that’s being reflected. And then you difference that and you get the amount of radiation that’s being absorbed. So now what we have is a rate at which that snow is absorbing solar radiation. Yeah? >>: For your pixels on high vertical elements, how do you match a very steep slope and very vertical slopes with the pixels that reflect or show the amount of water? >> Jeff Dozier: Okay. So what we do is when you’re actually doing the energy budget computation, then you actually want to use the true area of that slope, okay? But then when you calculate the water that’s being produced, you need to normalize it by the projected area, okay? So one example of that is suppose you have a building and you’re using solar energy against the side of the building in order to produce something in the building. You know, probably something illicit, right? [laughter] Okay. So in that case in terms of the energy you want to use the full side of that building. But in terms of the product that’s coming out of it, you normalize that by the footprint of the building, okay? So it’s the same kind of analogy here, right? Okay. We go through the same thing with the long wave radiation but -- so we take what’s measured. We take the air temperature and we calculate an atmospheric emissivity, I’m going to go through this more rapidly just -- then we go through, I think the thing that is interesting that we have to deal with here is we have to account for the fact that some of the radiation on a slope. Some of the long wave radiation is coming from the sky, but some of it is coming from the adjacent terrain. When we did work in the upper part of the king’s canyon we would approach it by skiing up the Onion Valley road and then skiing over Kiersarge [phonetic] Pass. And on the Onion Valley road there was one spot where there was a south-facing rock wall right next to the road that went for about 100 meters. And that was invariably a pain in the neck because the road would be bare. And so you’d have to take your skis off and if you got a big pack you got to take your pack off in order to bend over to take your skis off. And so it was always kind of a hassle to just walk that hundred meters of road. That was entirely because this south-facing rock wall was radiating toward the road. So in fact we account for that and we go through these steps to end up with what is the net long-wave radiation. And usually the snow is going to be emitting more than is coming from the atmosphere so that those numbers are usually negative. Okay. And then, now that we’ve downscaled it, now we have to do this computation of the reconstruction. And the way that works, and this [inaudible], who is a current student of mine is sort of working on this part of the problem. She’s also a really serious climber and she’s run off to France with her boyfriend so I’m trying to make sure she finishes her Ph. D. [laughter] But if you have, let’s say your whole pixel is covered with snow so that fractional snow cover is one, and then what we can do is we can estimate based on all these energy considerations how much snow melted on that day in that pixel. And if the snow all disappeared at the same time over some number of days, we could just back calculate and figure out how much there was. So if I start out with all snow cover and I’m calculating where I’m melting 10 millimeters a day, and in 20 days it’s all gone, well, I can just do the multiplication I know how much there was to start with. But the fact that it’s not uniform means that -- So that as it melts, as that fractional snow cover value is changing, then I can also estimate the spatial heterogeneity of the snow in that pixel. In other words, if on one day I can melt 10 millimeters of snow water equivalent, and let’s say five percent of the snow cover disappears on that day, then I know that five percent of that pixel was at that depth on the day before. So therefore we end up with a reconstruction of the snow water equivalent that is spatially heterogeneous. Okay. So ->>: [inaudible] always melt at the same rate of regardless of the depth [inaudible]? >> Jeff Dozier: It melts. Okay, so it takes some energy to bring the snow to zero, and we sometimes call that the cold content of the snow. That amount of energy is actually pretty small because the specific heat of ice is only 1900 joules per kilogram per degree, whereas latent heat diffusion is 335,000 joules per kilogram. So we do account for that, but the cold content is pretty small compared to the energy needed to melt. So we can account for it, we do. If you get it wrong you’re probably okay, you know? But the second part of your question is kind of interesting. And that is when the sunlight that melts snow penetrates into the snow pack a little bit. It’s a multiple scattering problem. So the melt is actually occurring over a depth, not right at the surface. Okay? So what happens when it gets shallower? Well, you do reach a depth somewhere around there where you’ve got radiation that is penetrating -- where you see this is the albedo appears to drop. But what is really happening is the radiation is penetrating all the way through to the underlying soil, warming the soil, and that therefore is causing the melt to occur. >>: It melts pretty equally up until the last [inaudible]? >> Jeff Dozier: Well, no. It melts depending on the energy input. >>: But I mean, say all the other variables are equal, [inaudible]? >> Jeff Dozier: Yeah. So if we go back to those diagrams of the snow pillow you see that melt curve is -- the one thing you notice is that it’s steeper in the bigger year. And the reason for that is that in the years where you have a lot of snow a lot of times the maximum accumulation occurs later. And so the melt is occurring when the temperatures are warmer and the days are longer and the sun is higher in the sky, and therefore it goes more quickly. So yeah, there’s a lot to learn by looking at the slope of the melting end of the snow pillow. Okay. So how does it all do? So what we’ve done is we’ve gone through several validation steps. One is just comparing it to the surface measurements. So far in this model there’s no surface measurement in it. It’s un-calibrated; we’re not tuning it to any results. So then we can really use the surface data as a true validation. There’s no circular reasoning involved here, right? So there are two sets of data that we’ve compared. One is from the snow pillow that I showed you, and the other is there’s also measurements from snow courses that occur. That is because they started trying to do this snowmelt runoff forecasting in 1910 before you could operate something electronically out in the wilderness and transmit it back to anywhere. So initially up to about the 1970’s, most of the reliance was based on people who skied around with these tubes in their pack and once a month they would plunge, they’d go to a set of sites, they’d plunge these tubes into the snow, pull it out, weight the tube, ter the weight, and you’ve got the amount of snow. And this still goes on because you need to make sure that the snow pillows are working and it’s a great job. [laughter] There’s very little turn over. You know, if you want to get a job as a snow surveyor you’re going to probably wait a little while. But the nice things are those lines. Those are just the one-to-one lines. Those are not a regression line. So it looks pretty good. And also the distribution of errors is unbiased. That is, the error that peak of the histogram is at zero. And you always like to do that because you can deal with random error pretty well. Dealing with bias is a problem. And also there are errors in these measurements. So the snow pillow is measuring over about an eight by ten foot area. The pixel is, you know, at 500 meters. So the snow course is typically a transect, right? So they try to do the best they can to make that measurement representative of what’s around them, but it’s not always easy. So we’ve also looked at the spatial variability. That is by taking something about the size of the MODIS pixel and, you know, making measurements all through it in a variety of areas. But what this tells us is that we think we’re trying to validate this method in the Sierra, but then we would like to use it in a place where there are no measurements at the surface, or very few, where the Taliban have destroyed what surface infrastructure used to exist. And yet it’s a place where we’re making a lot of investments in managing water and where water really means something to people. Okay. The other kind of validation is with the input data, that is, and so the top row is solar radiation. And that looks pretty good. There are a lot of stations that we measure. There are forty or so stations up in the mountains. A little bit more scatter with higher elevations. The long wave radiation doesn’t look as good. A couple of things though that -- first of all, there are really only three stations in the Sierra that measure this reliably. And it’s a tough thing to measure because your instrument heats up. So if you think of an analogy with a laboratory measurement, what if you’re trying to measure something and your instrument is emitting the same thing you’re trying to measure. So there are issues with temperature compensation of the instrument. So I don’t know that this error is entirely in the satellite retrieval. And then similarly we see a model of that long wave radiation based on air temperature and humidity. Hmmm, looks like I’ve reproduced this graph. I’m sorry. Based on temperature and humidity and an estimate of cloud cover. And again that’s underneath the value that the satellite is telling us. So there’s error there and we’re not quite sure what that is. Okay, so that all looks very good. I can give you a pretty good estimate of the spatial distribution of the snow water equivalent, you know, near the peak of its accumulation. But that’s not useful in a forecast, right? Because I’ve given you the answer, you know, a couple of months after you would’ve liked to have had it. You know, the water is already in the stream. You have an idea already of how much water came down. So we’d like to compare this to two different methods that operate in real time or close to real time. So one of them is just and interpellation based on the snow pillows. So, you know, what we’ve done is taken those snow pillow measurements and we can do this everyday, but we just take your favorite GIS package and run the interpellation through. And of course if you do that you’ll spread snow out into the ocean, right? So what we do is we do use the remote sensing measurements to tell us where that snow has to go to zero because the problem is if you have a lot of data points, none of them are zero. Well, there’s no interpellation method that’s going to get you to zero anywhere in that. So that’s one way of getting the answer. And the second what is there’s a numerical weather model that runs called Snow DAS, the Snow data assimilation system that Noah runs out of the Boulder office. And that’s based on whatever they can get. They use satellite data. They use surface measurements. They use numerical weather forecasting. And this is what we get from the reconstruction method. And these are the median values for the seven-year period. And that’s the period in which Snow DAS has been running. MODIS goes back to 2000. And so what you notice about this is there’s a lot more snow in the reconstruction. And so one of the questions is, well, is that right? And so what we’ve done is to look at that. And so in each of these 18 river basins these are the areas in the Sierra, or the basins in the Sierra where they estimate what is called the full natural flow. And what that is is the flow in the river that would be there if there weren’t dams or diversions. So they take the actual flow and then they try to account for storage and evaporation from the reservoir. And in some cases for diversions that are happening above the gauging station. And the colors represent the three methods. And so by and large, and we just did this as a rank correlation, so are we hitting the big years and the low years? And in 13 of the 18 basins the reconstruction method is the best. All right? And even in the places where it’s in second place it’s pretty close. Whereas the other methods have some years or some basins where they don’t do very well. So that’s good. That’s one thing. And then if we actually do the volumetric comparison we also get a result that’s very useful. And this is for each of the eight major drainages. The one thing you notice here is that the black line, which is in the reconstruction, that it’s actually got more snow in it than is coming out in the river, which is what you got to have because you’ve got to have something left over for evaporation, right? So the problem with the other two methods is that in some of the basins they’re actually having less volume of water in the snow pack than is coming out in the stream, and that’s physically not possible, right? So the fact is that the ->>: Are you presenting zero precip and zero rain then? >> Jeff Dozier: We account for rain but generally in, you know, this is the Sierra Nevada. This is not Washington. So there’s -[laughter] [inaudible] >> Jeff Dozier: We don’t get much rain in the -- so these are compared with the April through July runoff. And there’s not a lot of rain in that period. So, yeah, the rain is accounted for, but it’s not -- you still got this. In other words, we’re still getting enough water in the reconstruction method. So the fact that the reconstruction is giving us a bigger estimate turns out to be good, right? Okay. And also if we look at the cumulative probability statistics that by and large our r squared values for the construction method are all pretty good, whereas some of the r squared values for the other measurements are not so good. Okay. So why am I, you know, spending my time as a Microsoft intern, right? So this nice statement that came form Jim Grey that in working on these complicated problems you’re always going to want to go from something that works to something that works better. There’s a lot that goes in to trying to do this, and if you try to make every step as completely right as you would really be satisfied with, you’ll never get to the end, right? So you got to put something together. But then what that means is that there are lots of things to improve along the way. And that’s where expertise from other people would help. This is in terms of the end-to-end problem, I know more about it than anybody does. But in terms of the steps along the way, that there’s expertise out there, perhaps some of it in this room, that would help. So let me go through a few of the examples. So one of them is this pattern recognition problem in that how can we use the reconstruction values to help us improve the forecast? And one of the areas, one of the possibilities, is looking at snow-covered area. But that tends to differ a little bit between the basins. So this is the American River in the wet year and the dry year. There’s a lot more snow in the wet year, but the snow-covered are is not that much greater. Whereas in the Kern, same sort of thing. That wet year had snow in a lot of places that the dry year didn’t. So certainly snow-covered area is a gauge to how much snow there is, but it’s not perfect. And in fact what we did is to look at all of the basins in the San Joaquin drainage, looking at the snow-covered area in relation to how much snow volume there was, both on April first and on June first. And notice that in June that relationship is a lot tighter. And I think you would expect that because in April there could be a lot of snow. The accumulation season is just finished. There may be occasional storms going on. So there’s a lot of snow that may or may not be very deep, whereas in June what’s left was pretty deep, right? And in fact we see that if we take the r squared between these relationships, between the snow-covered area and the snow water equivalent, you can see that that r squared climbs as we go through the season. So by the time we get to June that snow-covered area is a pretty good gauge of how much snow is there. But in April it’s not so good, right? Or it it’s not as good. So I think there are other things that one can do to explore the pattern and -- Okay. So there’s an interesting algorithmic problem that goes on. In order to account for this long wave radiation coming from the terrain, to really do it right I would need to know the viewshed for every point. Okay? Now, we have an approximation that we use, but to really do it right I’d like to know the viewshed. So I ran a little -- so I tried to do this for this data set that covers the whole western United States at three arc second resolution. There are 720 million pixels here. And, you know, I run mat lab and mat lab has a view-shed algorithm. And I was about to go to sleep so I plugged in my laptop and I picked a point right out here and said calculate the viewshed of that area. And that’s just one point. And I woke up the next morning it was still running. Okay? Now, I looked at the code. The nice thing about mat lab is the toolbox source code is available. And if I had looked at the code first I probably wouldn’t have done this. You know that anything trying to address this problem that starts out for row equals one to number of rows, you know, it’s probably the wrong -- I mean, at least you could bound the search radius based on earth curvature or something like that. But you need to fix this problem algorithmically. This is not something that Moore’s law -- yeah, if I had 720 million processors I could do it in two days, right? But I don’t have 720 million processors and nobody is probably going to let me use them for two days. >>: Question. Is viewshed what it looks like in a sort of 360? >> Jeff Dozier: Yeah. Here’s sort of an example here. I need to qualify the answer to that. The answer is yes. So here’s the area in the Tuolumne basin where the snow pillow is. And that’s the viewshed for that point. But this is not. It’s much smaller than the viewshed that you would see standing up. So it’s what you would see in the viewshed if you were lying on the ground looking up, okay? But yeah, it’s everything you could see. So algorithmically there’s a couple of things. One is I want the viewshed for every point. And so it’s not just a problem of getting a fast calculation for doing it for one point, okay? So are there good ways to bound the search radius, for example, and are there good ways of taking advantage of reciprocity? That is if I’m in your viewshed, well, you’re in my viewshed. Right? And I’ve looked through the literature on this and I can’t really find anybody that’s doing a very good job of this. Now, what we’ve done is we’ve back in 1981 I actually figured out a way to get the horizons in order in time. And that completely bounds your search radius. So I do have a partial approach to this problem if somebody wants to work on it. I can tell you exactly what the search area you need to consider, right? Okay. There’s a remote sensing problem that is also kind of a generalized imageprocessing problem. And that is that the snow cloud discrimination is not completely solved. If you’ve got thick clouds, so this is the Lance head image. These are the visible bands. But this includes a band out in the short wave infrared where clouds are bright and snow is dark. And yeah, the clouds pop right out at you. But these are cumulus clouds. They’re thick. Here’s one in the Hindu Kush that’s a little more problematic. These are some data dropouts. And what you see is even with different combinations of spectral bands, it’s hard to distinguish what’s snow and what’s cloud. And but you can kind of tell with your eye. So I think what a generalization of this problem that is kind of would be interesting for someone to work on is generally when I read the image processing literature there are papers that deal with shapes, and there are papers that deal with spectra. And not many deal with both. So to me this is kind of a combination of a shape recognition problem and a spectral recognition problem. And so there’s fruit to do things there. There are data quality problems in e-science. So the 2000 snow pillow at Dana I showed you the data for, well, I’m not sure how maybe those 2007 measurements were a little bit high. Jessica Lundquist, who’s a faculty member at the University of Washington, she was working in this area and the Dana Meadows pillow wasn’t corresponding very well to what she was seeing in the other pillows. So she went to it. And there’s a tiny tree growing in it. Okay? And when she took the tree out the snow water equivalent dropped by 12 inches. Okay? And then another remote sensing problem is finding snow in the forests. So what we do with this fractional snow cover is we divide it by one minus the fractional vegetation cover. So it kind of helps us estimate what snow is in the trees. And that works pretty well up to about 75-80 percent canopy cover. And the way we kind of know this is a student of Jessica’s and a student of mine got together and carried out this very interesting experiment. They buried a lot of these little temperature sensors all over the place. And then you can tell from the daily temperature record whether it’d snowed on it or not. And so they have a -- this is real ground truth. And for these images are arranged in order of increasing canopy cover. And the squares on them represent the size of a MODIS pixel. And so what you see is if you look at the comparison between what we see from satellite and what we measure on the ground is it’s not bad until you get down to this here, this pixel here. We’re missing a lot of snow under the forest when we can’t see through the trees, right? Okay. There are other examples. The first one is I think the more generic and applies to all of e-science and that is that we typically use the kinds of software that you would expect for data management, like [inaudible]. Or some people use excel. But by and large those aren’t the tools that we use to analyze data. And so having these things talk together is a real problem. The SQL server file table has really bailed me out. I just learned about it. But you know we sometimes run programs. When I do a year’s worth of analysis I could open 2000 image files and keeping track of those has been problematic because stuffing them out on the file system and then putting an entry into the database is really easy to get those out of sync. But SQL server file table just handles that. A more scientific issue is this one of error propagation where you’re doing something that’s of this complexity there are errors all along the way. And how do they affect the final result? I improve presentation, Alex, if he’s finished working, has got integration with layerscape. And then, you know, your good idea goes here. So this is what the Hindu Kush looks like. This is compared to anything in the western U.S. It’s a really broad mountain range. Even at 20 thousand feet all you can see in every direction are more mountains. Anyway, thanks very much. [applause] Jonathan is here. He’s been waiting for this part. [laughter] >>: It’s not in layers. >> Jeff Dozier: Okay. >>: [inaudible] >>: One question. How much computation are you consuming to generate what you’re generating now? [inaudible] >> Jeff Dozier: Oh. Okay. So everything we’ve done so far is in an area that’s not all that big, the Sierra. So it is big enough that we have to worry about Earth curvature in terms of the radiation budget. So I should have a more precise answer to that question than I do. You know, we have machines. We have a multi-core machine and then we have a cluster that we’re running on. We can keep up with the dataflow. In a week we can do about the ten-year data record. Okay? But ->>: [inaudible] >> Jeff Dozier: It’s, let’s see, I think there’s 64 mat lab nodes on it. I think it’s 196 nodes on the cluster, but the mat lab license is just 64. >>: Are you the only one using it? >> Jeff Dozier: No. No. No. Yeah, no. I’m actually waiting for some software that the MSR mat lab license doesn’t have yet but we’ve ordered. So I plan to do some experiments on the mat lab DCS that we’ve got in the building here. But there’s one major software package that the mapping toolbox that we’re missing. >>: Do you have a good sense of what the square mile to computational ->> Jeff Dozier: You know, I could probably answer that question more precisely just doing some timing statistics. But, you know, it’s an answerable question. I just don’t have it off the top of my -- yeah, Jonathan? >>: If the heat from your computation melts the snow -[laughter] You need their algorithms. >> Jeff Dozier: The one thing I did discover in doing some timing tests is that the laptop runs slower when it’s on battery power. And my solution to the horizon problem, which I thought was order n, you know, I was doing some timing statistics and it wasn’t coming out to be order N. I’m going, like, God, I’m comparing things that I’d run here and things that I’d run at home. Fortunately I thought about it when I came back to the office here. I said maybe I ought to plug this in and see if it works faster. And it did. It was about 50 percent faster running on the grid power. Let’s see. Okay. So one thing we’ve done with this is Alex has put an interface with Bing maps. And this I think is pretty useful in general in that this is a way to get geotif [phonetic] images into Bing maps. And so he’s got things where you can adjust the scale range. If you want to wipe out some of the lower elevations, or if you want to see what’s under this place that’s getting a lot of snow then he’s got an opacity slider so you can see that the Truckee is getting dumped on an so forth. And then you can switch back with another date and if you prepare ahead of time you can cache these. So the switching back and forth runs pretty quickly. And then if you want you can switch to one of the other methods. >>: So what is this that we’re looking at? Is this the maximum snowfall? >> Jeff Dozier: This is the maximal water equivalent in each of those years. Okay? >>: So like, [inaudible] you’re just taking accumulation? >> Jeff Dozier: Well, not only that, but it varies with the pixel. So the date at which the maximum occurred will be different for each of the grid cells. So we’re going through and picking out what that maximum is and then carrying that through to have a single image. Now, what we can do is -- this is for a single date. And so what we’ve done is we’ve loaded lots of dates in here and again, because these dates are not cached it takes a while for them to show up. But you know this is something that, you know, Dan and I are going to go talk to the Bing maps products guys about. I think the snow problem is interesting but the fact that you can take imagery that you yourself could supply and put into this is something that, you know, even I know in MSR you always worry about what the product guys think but this would be a useful addition to what the tool can do. Yeah? >>: So is this data that you’ve collected currently just used by you and your team and the research community? Or are you commercializing it in some way and letting the utilities and the water districts have access [inaudible]? >> Jeff Dozier: Well, we’re giving it away. And the reason is it’s mostly been government funded. So we can’t really sell the forecast. But we are in discussions with the people like the California snow survey because they would like to reduce the tails of the error distribution. You know, a sort of 10 percent error they can live with. But having the occasional year when you’re way off would really be bad. And then I’ve actually got a project with the arm called Regents Lab to try to do this in Afghanistan. And so we’re cranking up a system at JPL to produce the MODIS images and then I hope to run some of that here in the next month. So I know most of the interns are going home in a week or two and they’re panicked but I’m here till the 14th of September and also I was, I think at least, perceptive enough to panic the first week I got here. So -[laughter] >>: Is there any snow in the Sierras at this point? >> Jeff Dozier: Only in some of the -- because this was a pretty light year. So certainly in the previous year there was still snow. Well, in the previous year we were skiing well into July. So there is snow, but it’s, you know, it’s mostly in very ->>: [inaudible] >>: As of the survey last Thursday afternoon. >>: [inaudible] [laughter] >> Jeff Dozier: Okay. But, yeah, well, there is some permanent snow in the Sierra. >>: [inaudible] >>: The question is because in the height of summer, even west of the Sierras there are isolated places that never melt. >> Jeff Dozier: Right. Yeah. >>: So the question of permanent accumulation is something I’ve never heard discussed. >> Jeff Dozier: Well, it’s discussed a lot, but it’s more discussed in the Himalayan context because the fact of the shrinking glaciers. The issue is to what extent is the glacier melt an important component of the water supply versus the seasonal snowmelt? And I think, you know, I think people have thought about have a gut feel about what the answer to that question is. I don’t know that anybody has really -- well, people are working on that, partly using say isotope tracing to figure out where the water is coming from. >>: What’s up with the model that you’ve described so far would seem to not distinguish it. It would seem to just cover every ->> Jeff Dozier: One of the things that we would have to do in the Hindu Kush is have glaciers as a separate N member because in that case those don’t away at the end of the season. >>: The Himalayan altitudes you get a solid to gas transition. >> Jeff Dozier: Well, we get it here. We get a lot of ->>: At these altitudes? >> Jeff Dozier: Oh, yeah. Yeah, we get our instrument site we lose 20 to 30 percent of the snow form sublimation. Yeah. >>: Is that in the model? >> Jeff Dozier: Yeah, yeah, yeah. That’s in the model. Yeah, we run an energy balance model. So if you get into the forest where there’s no wind then you get very little during the winter. But out in the open when it’s dry and windy a lot of the snow can go away. Where we see it at our instrument site is we have [inaudible] underneath the snow pack. And we see periods when the amount of snow is dropping but we’re not seeing any water at the bottom, so yeah. And also even in the Himalaya you can get melt at temperatures below zero. Where air temperature is below zero. >>: From energy absorption ->> Jeff Dozier: You get energy absorption, you get it down in the snow pack where there’s sort of a solid-state greenhouse effect. You can also freeze water when the air temperature is above zero. Just, you know, because the surface temperature is not necessarily the same as the air temperature. So, for example, on very calm nights the surface can get colder than the air. And they used to make ice in India by this. They’d lie out very shallow pans of water and they would produce ice even on nights that the air temperature never got below zero. >>: Any other questions? >> Jeff Dozier: Well, thank you all. [applause] >> Jeff Dozier: Yeah. So anybody who wants to come talk about this or have ideas I’d certainly be happy to talk. We’re also I think I’ve talked with Yon [phonetic] about how to incorporate some of this thinking into the open data open science stuff. And one of the issues is there an effective way to kind of put the problem out there? And then have a mechanism for ways that people could interact, you know, kind of like the [inaudible] against the world chest match. Anyway, thanks all for coming.