>> Dan Fay: I want to welcome you to... California, Santa Barbara. Jeff is up here visiting us...

advertisement
>> Dan Fay: I want to welcome you to this talk by Jeff Dozier from University of
California, Santa Barbara. Jeff is up here visiting us for a few months to look at some
of the interactions with how you can use some of the technology with some of the
stuff we’re doing here at Microsoft research.
And so Jeff has been involved with a lot of the different activities we’ve been doing
for many years. Everything from Rolleon [phonetic], Terror server that we were
doing with Jim Grey and some of the other folks, and also down to some of the other
activities at the assignments workshop, and is actually a Microsoft e-science fellow
for us.
So I want to welcome Jeff and he’s going to give a talk on snow hydrology at the
scale of mountain ranges. So welcome, Jeff.
>> Jeff Dozier: Thanks. Great.
[applause]
So as you can see I’m spending my summer at MSR. I think of myself as the oldest
intern in the group.
[laughter]
And I want to acknowledge two students who have helped a little bit with this. Carl
Ricker is a UCSB student who just finished his Ph. D. in early July and is now working
at a jet propulsion lab. And a lot of the presentation is based on stuff that he did.
And then, of course, my local intern Alex Wiggins who is a student at Oregon State.
And he’s working on some of the visualization aspects of this. And he’s preparing
something to show you after I’m done. He works much closer to deadlines even
than I do.
Okay. So the issue with snow that you would know in any part of the western
United States, and affecting probably a billion people worldwide who get water from
snowmelt in the mountains, is that if you look at the record from a snow pillow,
which is a device on the ground that measures the weight of the snow on top of it,
then the typical season is that that snow starts accumulating early in the season
sometime, October or November, and then it fits and starts.
The snow pack grows, and then some time in the spring it starts to melt. And there
may be a few snowfall events during that melt, but, you know, by and large it melts
pretty quickly. And it, of course, happens at the time that the water demand is
increasing. So the snow pack really is the natural reservoir over much of the
mountain west.
And these graphs here show the same thing. What I’ve done is to take the snow
pillow record on the right, take the daily differences, and express those as either
positive or negative. And so again what you see is that the accumulation period is
the set of storms that occurs over periods of six months or so, and then the melt
period occurs pretty quickly.
The other thing that’s kind of interesting about this graph if you look at a relatively
lean year, 2007, with a relatively big year in the Sierras, 2011, is that the sizes of the
storms are not that much different. The difference in a big year is you get more
storms occurring.
Okay. And because of the importance in this snow resource for agriculture and
urban use and hydropower and so forth, there’s a forecasting effort that goes on.
And typically what water agencies do is around the first of April or so they issue a
forecast for the seasonal runoff.
That is what they expect to get from April to July, you know, based on all these
instruments that are in the mountains. And what this graph here shows is that that
relative forecast error is pretty high. That is if you look at the median error, the
median error is about 20 percent or so. And if you go out to the rarer events, say if
you go up to the 90 percentile on the accumulator probability distribution, that
error is nearly 60 percent.
So what that says is, you know, in one year in ten that forecast error is going to be
substantial. And the reason in particular in the Sierra Nevada where that’s
important is that this lower graph shows the reservoir capacity as a fraction of the
mean annual runoff. And what you see is most of the basins are riding right around
100 percent or a little bit less.
There are few exceptions. Well the Truckee, they’ve got Lake Tahoe, so there’s
about a three meter dam height at the Truckee River spill [inaudible]. So if you
multiply three meters times the area of Lake Tahoe it’s a pretty big number.
And then the Malones Dam in the Stanislaus is big. But other than, you know, we
don’t store much more than the annual runoff throughout that mountain range. And
so therefore forecast errors in a single season make a big difference.
You know, whereas if you had say five years of storage, then those errors wouldn’t
be so important. Okay. So there are several reasons for why forecasting is difficult.
And one of them is that by and large the snow sensors don’t cover the highest
elevations.
Partly just that you get higher, it’s a bleak, windy environment, and it’s harder to
install things. But what you see on the left for a two-week period in 2011 is that in
July that all of the snow sensors are below the snow zone, and yet there is a fair
amount of snow that’s still in the watershed.
So in other words, at this point if you look these are the snow pillow records for all
of them. And by that first image up on the top, all of them but two have gone to zero,
and by the second image they’ve all gone to zero.
So there’s still snow in the watershed and yet all of your surface network says the
snow is gone. And where this is important is the reservoir manager for the Hetch
Hetchy reservoir that supplies water to San Francisco. He’s kind of wondering
what’s going on at this point because he can let some of that water out and generate
hydropower and make about $125,000 a day, but then he runs the risk of you’re not
being able to take a shower when you visit San Francisco in late October.
And so if he knew that there was more snow in the watershed then he could in fact
make a better decision, whereas thinking that there’s no snow in the watershed
means he has to kind of hang on to what he’s got.
Okay. The other reasons for heterogeneity are, you know, the big difference
between snow and rain, aside from the fact that one of them is liquid and the other
is not, is that snow moves around on the ground after it falls. And this picture on the
left is at the Reynolds Creek watershed in Idaho.
You can see the fence in the background, that snow in the foreground is probably up
to your shins, and yet back there where those trees are the snow is three meters
deep. And that’s pretty important. That’s why those trees have decided to grow
there, right?
And then similarly there’s a differential melting that occurs. So this is a typical
scene that you see in the mountain west with one slope covered with snow and the
slope opposite it bare. So there’s a couple of things that we do in order to try to
characterize this heterogeneity and what’s going on in the snow pack at very large
scales.
That is we want to look at the whole mountain range. So one of the things we do is
because that snow is heterogeneous, and we want to see it pretty much every day,
or we want to get it enough opportunities so we can really keep track of what’s
going on with it, we need to use a satellite sensor that has a high revisit interval.
And so we use motives because we have data every day. But the problem is that
pixel is 500 meters in size, and so there’s a lot of heterogeneity that’s occurring
within that pixel. And so what we do is we have a method to actually map what we
call the fractional snow cover.
And this is going to be a pretty typical problem any time where you want a rapid
revisit interval. Therefore that drives you to a somewhat larger pixel and therefore
you’re going to have this mixing problem.
And so in this case we figured out a way to solve that. So we can actually say how
much of that pixel is in fact snow-covered.
And then what we do, and this is -- first of all, it’s okay to ask questions during the
talk. You can interrupt me. But the consequence of that is sometimes when people
do that you don’t get to the end, right? You get into discussions.
And so what I’m going to do is, this is the end, but then I’m going to work backwards,
okay? So the end is that we really want to know how much snow is in the
mountains. And we want to know its special distribution.
And we want to know that for a couple reasons. One is, you know, the help with the
forecasting problem, but the other is to sort of look at the impact that snow has on
the hydro ecology of these sorts of mountain ranges that this special variability is
really important.
So there is no method currently to directly measure this from remote sensing.
Right? And I can discuss efforts on how to do that directly, but that’s another lecture
entirely.
So what we do instead is we can track when the snow, the rate at which the snow
disappears and from that we can reconstruct what that spatial distribution was.
And so, and that’s what it looks like in this wet year, 2011, it was wet in California.
It wasn’t wet up here because it was wet in California, right?
2007 it was dry in California, so you know, Oregon and Washington got a little bit
more in that year. But what you can see is this is what that spatial distribution looks
like.
And these show the three months starting April 1st and on the right June 1st. And
what you see in the big years there’s still a fair amount of snow left in June, right?
Okay. So how do we do that? And what are some of the e-science problems
involved?
>>: One quick question [inaudible]. How much is [inaudible] a factor in terms of,
you know, ice crystals turns to water. You know, clearly the amount of time it takes
for that to get to a point where it needs to be managed in a reservoir is going to be
different depending on where it came from.
Is that -- I mean, what sort of time range are we talking about? Hours? Days? Or
weeks?
>> Jeff Dozier: It’s days.
>>: It’s days. Okay.
>> Jeff Dozier: It’s days, yeah. Typically these reservoirs don’t have a lot of ground
water storage. They’re up in the mountains. They’re mostly bedrock. The answer
to that question varies a little bit because part of that residence time is in the snow
pack.
>>: Right. Yes.
>> Jeff Dozier: So you’re melting your water at the top of the snow pack, then it has
to get through the snow. And then it gets through the soil. But yeah, it’s typically
days to a week or two. It’s not, whereas in a mountain range with deep ground
water storage then it could be longer.
>>: So you could have like a super hot day all of a sudden out of the blue ->> Jeff Dozier: Oh, well, you see ->>: You know when it’s going to show up. I think of this from, in particular say, like,
the Grand Coulee Dam, where the water is showing up in the river and then the
challenge that we have up here is that they’re trying to manage the flow through the
reservoir so they don’t risk too much coming in topping the dam.
And also you’ve got this challenge with wind power. They’re trying to figure out
whether or not to curtail wind because they need to push water through the dam
because there’s trouble of [inaudible].
>> Jeff Dozier: Yeah. Well, this is one of the problems in that if you’re, in your
reservoir management has multiple purposes. One of them is flood control and
another is provisional water resources. And if you think, those are diametrically
opposed to one another.
But you can sort of see what the timing is simply based on the fact that when you’re
in the mountains you do see a diurnal variability in the stream flow. And typically
that is on the order of a few hours after the maximum rate of melt.
So that kind of shows you that there isn’t a lot of lag time going on into storage,
right?
>>: [inaudible]. Does sediment transport play into this whole analysis of how
management goes, or is that not so much an issue?
>> Jeff Dozier: Well, that’s an interesting problem in that the one thing that does
occur with sediment transport is that there is this predictable sort of diurnal
variability, predictable and seasonal. It not only applies to sediment transport but it
applies to where the fish are trying to hide while they’re hatching and stuff.
In other words, during periods when flow is too low sometimes the water gets too
hot and it affects fishes, and so sustaining -- So there’s lots of things going on in the
way that this heterogeneity, both spatially and temporally, not only affects the way
we manage water, but it also affects the sort of habitats that the water supports.
Okay. So how do we do this? Well, you know in e-science you got to have a
workflow diagram. I usually write scripts, but what this is showing, first of all, you
know, trying to impress you with its complexity I guess. But the stuff that comes in
blue is all data that we get from somewhere else.
So the MODIS data and then VIIRS is a sensor that just went up that we’re expecting
to use, digital elevation models, land cover data, and then the assimilated data from
the national land data assimilation system.
And then everything else in red is sort of stuff that we do. And so I can take you
through some of that. So the first is, you know, how we actually go through this
processing to produce the fractional snow cover estimates.
And what we do here is we take advantage of the position of the MODIS bands,
which the MODIS has 36 spectral bands. Some of them are at 500 meters, two of
them are at 250 meters, and then the rest are at one kilometer.
But the seven bands in this part of the spectrum are usually what are called the Land
bands. And if we look at how that matches the spectral reflectance of the major
components of what’s on the earth’s surface.
Well, what’s on the earth’s surface? There’s vegetation, there’s dirt, and there’s
snow, right? Water, which is pretty dark in all spectral bands, and then you know,
everything else.
>>: Are rocks different than dirt?
>> Jeff Dozier: It’s soil.
>>: It’s soil? Rocks aren’t storing water but soil is.
>> Jeff Dozier: Soil has variability in the reflections because it does get wet, yeah. So
generally when it’s wet it’s darker. Yeah. So that’s why you see this if you look at
different reflectance of soil, they all sort of have the same shape, but there is some of
them are darker and some of them are lighter.
Vegetation has this typical reflectance of being pretty dark in the visible and then
rising very steeply as you go into the near infrared part of the spectrum that
sometimes is called the red edge of the spectrum.
And then snow, which has a lot of variability because of the grain size. So, but notice
that the visible spectrum is only going out to point seven micrometers, and so much
of that variability in snow is out beyond the wavelengths that you can see.
But in fact if you think of color in this sort of extended way, including the whole
spectrum, snow is one of the most colorful substances in nature, you know?
>>: So it’s probably like the Northern Lights, but it’s all in ->> Jeff Dozier: It’s all beyond the wavelengths you can see, but if, you know, if our
eyes were sensitive out of the near infrared there would be a tourist industry in the
western United States. Kind of like the leaf peepers in New England, you know,
going out to watch the colors.
But the main thing, the point of this, is if you compare all those spectra, and of
course to do that I just selected one from each graph that you see, that snow is
pretty distinctive from soil, that it’s pretty distinctive from vegetation.
So those three substances tend to have distinctive spectral signatures. And the
MODIS bands are positioned well enough to catch those. And so what that means is
that if you have a spectral signature from a pixel, then you can decompose it to what
fractions of stuff went into it.
And so that’s where we get this fractional snow cover. So this is the Sierra. Again, in
a dry year on top and in a wet year on the bottom. And this little area that’s blown
up at the bottom these are the Tuolumne and the Merced drainage spaces, right?
>>: Are you computing that or is MODIS?
>> Jeff Dozier: No. Remember the red diagram. The stuff in red is us. Yeah. No.
Well, MODIS just makes the measurement and then we get the data and then we
make the inference.
Okay. And then so that was all the stuff on the left. And then what we do is it turns
out after we do that measurement there are some things we need to do to sort of
smooth out the data because there are gaps form cloud cover. And also how does
MODIS get these daily data when its orbit altitude is only 705 kilometers?
And in order to cover the earth every day that means it has to have a pretty wide
swath. And in order to have a pretty wide swath it’s got to have, some of what
you’re seeing is pretty steep or pretty off nadir view angles.
So the combination of the clouds, occasional dropouts, and these off nadir view
angles mean that your data are scrambled a little bit. So these are just two days
apart. This is April 8th and this is April 9th. And this was on a day when MODIS
happened to see this part of the world at a pretty near nadir view angle.
And here’s the histogram of the view angles for that scene. So it was right overhead.
And the next day, in order to even get that second day, it was way off to the side, and
in fact, notice that the view angles the histograms are bimodal. And the reason is
that this image was actually stitched together from one pass looking in this direction
and one pass looking at this direction.
And what you notice about these pixels. Well, what happens when you’re looking off
to the side? Well, firstly that pixel expands in both directions because it’s just
further away, right?
And the other is that it expands more in the cross track direction because of the
effect of the earth curvature. So what you see is that it’s the same scene but these
pixels here look a lot different than the one on the left.
So we could just ignore them, right, on these days? But what if that’s the only scene
I’ve got during a period when it’s cloud covered. So what we try to do is we -- I run a
smoothing spline [phonetic] through the data, but I weight the splines [phonetic] by
the inverse of that view angle. So that if my only view in a period that’s otherwise
cloud covered is off to the side, I’m going to use it. But in the cases where I have
data every day, then it doesn’t affect our analysis very much.
Okay, so then off to this part of the flow diagram. So what we’ve got is we’ve got
data on the snow cover and now we can tell kind of the rate at which that fractional
snow cover is going away during the year. And so now we want to look at how
much energy is actually involved.
So one of the things that the MODIS analysis also gives us is we estimate the albedo.
So we know how bright that snow is, and therefore how much solar radiation it’s
reflecting, and therefore from the difference how much it’s absorbing.
And then what we get from the NLDAS system is sort of gridded meteorological
reanalysis. And the way this works is the spatial resolution of NLDAS is 1/8th
degree. So it’s a lot bigger than our pixel, and that’s what you see for solar radiation
in the upper left hand corner.
And this is just for one hour, right? So then what we do in the second diagram is we
smooth that solar radiation to the resolution that we’re going to run the model at,
which in this case is 100 meters.
And the problem now is that 1/8th degree grid cell has a single elevation that is
assigned to it. So what we then need to do in the third part there is we scale that
radiation within that 1/8th degree pixel and we spread it over the 100-meter pixels.
We scale it accounting for the elevation.
And we just do a pressure scaling, right? And then we’ve got topography. That is
some slopes are facing the sun and some slopes are facing away from the sun. And
so then we lay that scaled radiation now down on the topography and then we deal
with the fact that some of that snow is under the trees.
So we apply a land cover map to this and figure out how much radiation is actually
hitting the snow. And then we’ve got the albedo of that snow. You can see the scale
there. And then if you multiply all that you get the amount of radiation that’s being
reflected. And then you difference that and you get the amount of radiation that’s
being absorbed.
So now what we have is a rate at which that snow is absorbing solar radiation.
Yeah?
>>: For your pixels on high vertical elements, how do you match a very steep slope
and very vertical slopes with the pixels that reflect or show the amount of water?
>> Jeff Dozier: Okay. So what we do is when you’re actually doing the energy
budget computation, then you actually want to use the true area of that slope, okay?
But then when you calculate the water that’s being produced, you need to normalize
it by the projected area, okay?
So one example of that is suppose you have a building and you’re using solar energy
against the side of the building in order to produce something in the building. You
know, probably something illicit, right?
[laughter]
Okay. So in that case in terms of the energy you want to use the full side of that
building. But in terms of the product that’s coming out of it, you normalize that by
the footprint of the building, okay?
So it’s the same kind of analogy here, right? Okay. We go through the same thing
with the long wave radiation but -- so we take what’s measured. We take the air
temperature and we calculate an atmospheric emissivity, I’m going to go through
this more rapidly just -- then we go through, I think the thing that is interesting that
we have to deal with here is we have to account for the fact that some of the
radiation on a slope. Some of the long wave radiation is coming from the sky, but
some of it is coming from the adjacent terrain.
When we did work in the upper part of the king’s canyon we would approach it by
skiing up the Onion Valley road and then skiing over Kiersarge [phonetic] Pass. And
on the Onion Valley road there was one spot where there was a south-facing rock
wall right next to the road that went for about 100 meters.
And that was invariably a pain in the neck because the road would be bare. And so
you’d have to take your skis off and if you got a big pack you got to take your pack
off in order to bend over to take your skis off. And so it was always kind of a hassle
to just walk that hundred meters of road.
That was entirely because this south-facing rock wall was radiating toward the road.
So in fact we account for that and we go through these steps to end up with what is
the net long-wave radiation. And usually the snow is going to be emitting more than
is coming from the atmosphere so that those numbers are usually negative.
Okay. And then, now that we’ve downscaled it, now we have to do this computation
of the reconstruction. And the way that works, and this [inaudible], who is a current
student of mine is sort of working on this part of the problem.
She’s also a really serious climber and she’s run off to France with her boyfriend so
I’m trying to make sure she finishes her Ph. D.
[laughter]
But if you have, let’s say your whole pixel is covered with snow so that fractional
snow cover is one, and then what we can do is we can estimate based on all these
energy considerations how much snow melted on that day in that pixel.
And if the snow all disappeared at the same time over some number of days, we
could just back calculate and figure out how much there was. So if I start out with all
snow cover and I’m calculating where I’m melting 10 millimeters a day, and in 20
days it’s all gone, well, I can just do the multiplication I know how much there was to
start with.
But the fact that it’s not uniform means that -- So that as it melts, as that fractional
snow cover value is changing, then I can also estimate the spatial heterogeneity of
the snow in that pixel.
In other words, if on one day I can melt 10 millimeters of snow water equivalent,
and let’s say five percent of the snow cover disappears on that day, then I know that
five percent of that pixel was at that depth on the day before.
So therefore we end up with a reconstruction of the snow water equivalent that is
spatially heterogeneous. Okay. So ->>: [inaudible] always melt at the same rate of regardless of the depth [inaudible]?
>> Jeff Dozier: It melts. Okay, so it takes some energy to bring the snow to zero, and
we sometimes call that the cold content of the snow. That amount of energy is
actually pretty small because the specific heat of ice is only 1900 joules per kilogram
per degree, whereas latent heat diffusion is 335,000 joules per kilogram.
So we do account for that, but the cold content is pretty small compared to the
energy needed to melt. So we can account for it, we do. If you get it wrong you’re
probably okay, you know? But the second part of your question is kind of
interesting.
And that is when the sunlight that melts snow penetrates into the snow pack a little
bit. It’s a multiple scattering problem. So the melt is actually occurring over a
depth, not right at the surface. Okay?
So what happens when it gets shallower? Well, you do reach a depth somewhere
around there where you’ve got radiation that is penetrating -- where you see this is
the albedo appears to drop. But what is really happening is the radiation is
penetrating all the way through to the underlying soil, warming the soil, and that
therefore is causing the melt to occur.
>>: It melts pretty equally up until the last [inaudible]?
>> Jeff Dozier: Well, no. It melts depending on the energy input.
>>: But I mean, say all the other variables are equal, [inaudible]?
>> Jeff Dozier: Yeah. So if we go back to those diagrams of the snow pillow you see
that melt curve is -- the one thing you notice is that it’s steeper in the bigger year.
And the reason for that is that in the years where you have a lot of snow a lot of
times the maximum accumulation occurs later.
And so the melt is occurring when the temperatures are warmer and the days are
longer and the sun is higher in the sky, and therefore it goes more quickly. So yeah,
there’s a lot to learn by looking at the slope of the melting end of the snow pillow.
Okay. So how does it all do? So what we’ve done is we’ve gone through several
validation steps. One is just comparing it to the surface measurements. So far in
this model there’s no surface measurement in it. It’s un-calibrated; we’re not tuning
it to any results. So then we can really use the surface data as a true validation.
There’s no circular reasoning involved here, right?
So there are two sets of data that we’ve compared. One is from the snow pillow that
I showed you, and the other is there’s also measurements from snow courses that
occur. That is because they started trying to do this snowmelt runoff forecasting in
1910 before you could operate something electronically out in the wilderness and
transmit it back to anywhere.
So initially up to about the 1970’s, most of the reliance was based on people who
skied around with these tubes in their pack and once a month they would plunge,
they’d go to a set of sites, they’d plunge these tubes into the snow, pull it out, weight
the tube, ter the weight, and you’ve got the amount of snow.
And this still goes on because you need to make sure that the snow pillows are
working and it’s a great job.
[laughter]
There’s very little turn over. You know, if you want to get a job as a snow surveyor
you’re going to probably wait a little while. But the nice things are those lines.
Those are just the one-to-one lines. Those are not a regression line.
So it looks pretty good. And also the distribution of errors is unbiased. That is, the
error that peak of the histogram is at zero. And you always like to do that because
you can deal with random error pretty well. Dealing with bias is a problem.
And also there are errors in these measurements. So the snow pillow is measuring
over about an eight by ten foot area. The pixel is, you know, at 500 meters. So the
snow course is typically a transect, right? So they try to do the best they can to
make that measurement representative of what’s around them, but it’s not always
easy.
So we’ve also looked at the spatial variability. That is by taking something about the
size of the MODIS pixel and, you know, making measurements all through it in a
variety of areas. But what this tells us is that we think we’re trying to validate this
method in the Sierra, but then we would like to use it in a place where there are no
measurements at the surface, or very few, where the Taliban have destroyed what
surface infrastructure used to exist.
And yet it’s a place where we’re making a lot of investments in managing water and
where water really means something to people. Okay. The other kind of validation
is with the input data, that is, and so the top row is solar radiation. And that looks
pretty good.
There are a lot of stations that we measure. There are forty or so stations up in the
mountains. A little bit more scatter with higher elevations. The long wave radiation
doesn’t look as good.
A couple of things though that -- first of all, there are really only three stations in the
Sierra that measure this reliably. And it’s a tough thing to measure because your
instrument heats up. So if you think of an analogy with a laboratory measurement,
what if you’re trying to measure something and your instrument is emitting the
same thing you’re trying to measure.
So there are issues with temperature compensation of the instrument. So I don’t
know that this error is entirely in the satellite retrieval. And then similarly we see a
model of that long wave radiation based on air temperature and humidity.
Hmmm, looks like I’ve reproduced this graph. I’m sorry. Based on temperature and
humidity and an estimate of cloud cover. And again that’s underneath the value that
the satellite is telling us.
So there’s error there and we’re not quite sure what that is. Okay, so that all looks
very good. I can give you a pretty good estimate of the spatial distribution of the
snow water equivalent, you know, near the peak of its accumulation.
But that’s not useful in a forecast, right? Because I’ve given you the answer, you
know, a couple of months after you would’ve liked to have had it. You know, the
water is already in the stream. You have an idea already of how much water came
down.
So we’d like to compare this to two different methods that operate in real time or
close to real time. So one of them is just and interpellation based on the snow
pillows.
So, you know, what we’ve done is taken those snow pillow measurements and we
can do this everyday, but we just take your favorite GIS package and run the
interpellation through. And of course if you do that you’ll spread snow out into the
ocean, right?
So what we do is we do use the remote sensing measurements to tell us where that
snow has to go to zero because the problem is if you have a lot of data points, none
of them are zero. Well, there’s no interpellation method that’s going to get you to
zero anywhere in that.
So that’s one way of getting the answer. And the second what is there’s a numerical
weather model that runs called Snow DAS, the Snow data assimilation system that
Noah runs out of the Boulder office. And that’s based on whatever they can get.
They use satellite data. They use surface measurements. They use numerical
weather forecasting. And this is what we get from the reconstruction method. And
these are the median values for the seven-year period. And that’s the period in
which Snow DAS has been running.
MODIS goes back to 2000. And so what you notice about this is there’s a lot more
snow in the reconstruction. And so one of the questions is, well, is that right? And
so what we’ve done is to look at that. And so in each of these 18 river basins these
are the areas in the Sierra, or the basins in the Sierra where they estimate what is
called the full natural flow.
And what that is is the flow in the river that would be there if there weren’t dams or
diversions. So they take the actual flow and then they try to account for storage and
evaporation from the reservoir. And in some cases for diversions that are
happening above the gauging station.
And the colors represent the three methods. And so by and large, and we just did
this as a rank correlation, so are we hitting the big years and the low years? And in
13 of the 18 basins the reconstruction method is the best. All right? And even in the
places where it’s in second place it’s pretty close.
Whereas the other methods have some years or some basins where they don’t do
very well. So that’s good. That’s one thing. And then if we actually do the
volumetric comparison we also get a result that’s very useful. And this is for each of
the eight major drainages.
The one thing you notice here is that the black line, which is in the reconstruction,
that it’s actually got more snow in it than is coming out in the river, which is what
you got to have because you’ve got to have something left over for evaporation,
right?
So the problem with the other two methods is that in some of the basins they’re
actually having less volume of water in the snow pack than is coming out in the
stream, and that’s physically not possible, right?
So the fact is that the ->>: Are you presenting zero precip and zero rain then?
>> Jeff Dozier: We account for rain but generally in, you know, this is the Sierra
Nevada. This is not Washington. So there’s -[laughter]
[inaudible]
>> Jeff Dozier: We don’t get much rain in the -- so these are compared with the April
through July runoff. And there’s not a lot of rain in that period. So, yeah, the rain is
accounted for, but it’s not -- you still got this.
In other words, we’re still getting enough water in the reconstruction method. So
the fact that the reconstruction is giving us a bigger estimate turns out to be good,
right? Okay.
And also if we look at the cumulative probability statistics that by and large our r
squared values for the construction method are all pretty good, whereas some of the
r squared values for the other measurements are not so good.
Okay. So why am I, you know, spending my time as a Microsoft intern, right? So this
nice statement that came form Jim Grey that in working on these complicated
problems you’re always going to want to go from something that works to
something that works better.
There’s a lot that goes in to trying to do this, and if you try to make every step as
completely right as you would really be satisfied with, you’ll never get to the end,
right? So you got to put something together.
But then what that means is that there are lots of things to improve along the way.
And that’s where expertise from other people would help. This is in terms of the
end-to-end problem, I know more about it than anybody does. But in terms of the
steps along the way, that there’s expertise out there, perhaps some of it in this room,
that would help.
So let me go through a few of the examples. So one of them is this pattern
recognition problem in that how can we use the reconstruction values to help us
improve the forecast? And one of the areas, one of the possibilities, is looking at
snow-covered area. But that tends to differ a little bit between the basins. So this is
the American River in the wet year and the dry year.
There’s a lot more snow in the wet year, but the snow-covered are is not that much
greater. Whereas in the Kern, same sort of thing. That wet year had snow in a lot of
places that the dry year didn’t.
So certainly snow-covered area is a gauge to how much snow there is, but it’s not
perfect. And in fact what we did is to look at all of the basins in the San Joaquin
drainage, looking at the snow-covered area in relation to how much snow volume
there was, both on April first and on June first.
And notice that in June that relationship is a lot tighter. And I think you would
expect that because in April there could be a lot of snow. The accumulation season
is just finished. There may be occasional storms going on.
So there’s a lot of snow that may or may not be very deep, whereas in June what’s
left was pretty deep, right? And in fact we see that if we take the r squared between
these relationships, between the snow-covered area and the snow water equivalent,
you can see that that r squared climbs as we go through the season.
So by the time we get to June that snow-covered area is a pretty good gauge of how
much snow is there. But in April it’s not so good, right? Or it it’s not as good. So I
think there are other things that one can do to explore the pattern and --
Okay. So there’s an interesting algorithmic problem that goes on. In order to
account for this long wave radiation coming from the terrain, to really do it right I
would need to know the viewshed for every point.
Okay? Now, we have an approximation that we use, but to really do it right I’d like
to know the viewshed. So I ran a little -- so I tried to do this for this data set that
covers the whole western United States at three arc second resolution. There are
720 million pixels here. And, you know, I run mat lab and mat lab has a view-shed
algorithm.
And I was about to go to sleep so I plugged in my laptop and I picked a point right
out here and said calculate the viewshed of that area. And that’s just one point. And
I woke up the next morning it was still running.
Okay? Now, I looked at the code. The nice thing about mat lab is the toolbox source
code is available. And if I had looked at the code first I probably wouldn’t have done
this. You know that anything trying to address this problem that starts out for row
equals one to number of rows, you know, it’s probably the wrong -- I mean, at least
you could bound the search radius based on earth curvature or something like that.
But you need to fix this problem algorithmically. This is not something that Moore’s
law -- yeah, if I had 720 million processors I could do it in two days, right? But I
don’t have 720 million processors and nobody is probably going to let me use them
for two days.
>>: Question. Is viewshed what it looks like in a sort of 360?
>> Jeff Dozier: Yeah. Here’s sort of an example here. I need to qualify the answer to
that. The answer is yes. So here’s the area in the Tuolumne basin where the snow
pillow is. And that’s the viewshed for that point.
But this is not. It’s much smaller than the viewshed that you would see standing up.
So it’s what you would see in the viewshed if you were lying on the ground looking
up, okay? But yeah, it’s everything you could see.
So algorithmically there’s a couple of things. One is I want the viewshed for every
point. And so it’s not just a problem of getting a fast calculation for doing it for one
point, okay?
So are there good ways to bound the search radius, for example, and are there good
ways of taking advantage of reciprocity? That is if I’m in your viewshed, well, you’re
in my viewshed.
Right? And I’ve looked through the literature on this and I can’t really find anybody
that’s doing a very good job of this. Now, what we’ve done is we’ve back in 1981 I
actually figured out a way to get the horizons in order in time.
And that completely bounds your search radius. So I do have a partial approach to
this problem if somebody wants to work on it. I can tell you exactly what the search
area you need to consider, right?
Okay. There’s a remote sensing problem that is also kind of a generalized imageprocessing problem. And that is that the snow cloud discrimination is not
completely solved.
If you’ve got thick clouds, so this is the Lance head image. These are the visible
bands. But this includes a band out in the short wave infrared where clouds are
bright and snow is dark. And yeah, the clouds pop right out at you. But these are
cumulus clouds. They’re thick.
Here’s one in the Hindu Kush that’s a little more problematic. These are some data
dropouts. And what you see is even with different combinations of spectral bands,
it’s hard to distinguish what’s snow and what’s cloud.
And but you can kind of tell with your eye. So I think what a generalization of this
problem that is kind of would be interesting for someone to work on is generally
when I read the image processing literature there are papers that deal with shapes,
and there are papers that deal with spectra.
And not many deal with both. So to me this is kind of a combination of a shape
recognition problem and a spectral recognition problem. And so there’s fruit to do
things there.
There are data quality problems in e-science. So the 2000 snow pillow at Dana I
showed you the data for, well, I’m not sure how maybe those 2007 measurements
were a little bit high. Jessica Lundquist, who’s a faculty member at the University of
Washington, she was working in this area and the Dana Meadows pillow wasn’t
corresponding very well to what she was seeing in the other pillows.
So she went to it. And there’s a tiny tree growing in it. Okay? And when she took
the tree out the snow water equivalent dropped by 12 inches. Okay? And then
another remote sensing problem is finding snow in the forests.
So what we do with this fractional snow cover is we divide it by one minus the
fractional vegetation cover. So it kind of helps us estimate what snow is in the trees.
And that works pretty well up to about 75-80 percent canopy cover.
And the way we kind of know this is a student of Jessica’s and a student of mine got
together and carried out this very interesting experiment. They buried a lot of these
little temperature sensors all over the place. And then you can tell from the daily
temperature record whether it’d snowed on it or not.
And so they have a -- this is real ground truth. And for these images are arranged in
order of increasing canopy cover. And the squares on them represent the size of a
MODIS pixel.
And so what you see is if you look at the comparison between what we see from
satellite and what we measure on the ground is it’s not bad until you get down to
this here, this pixel here. We’re missing a lot of snow under the forest when we
can’t see through the trees, right?
Okay. There are other examples. The first one is I think the more generic and
applies to all of e-science and that is that we typically use the kinds of software that
you would expect for data management, like [inaudible]. Or some people use excel.
But by and large those aren’t the tools that we use to analyze data.
And so having these things talk together is a real problem. The SQL server file table
has really bailed me out. I just learned about it. But you know we sometimes run
programs. When I do a year’s worth of analysis I could open 2000 image files and
keeping track of those has been problematic because stuffing them out on the file
system and then putting an entry into the database is really easy to get those out of
sync.
But SQL server file table just handles that. A more scientific issue is this one of error
propagation where you’re doing something that’s of this complexity there are errors
all along the way. And how do they affect the final result?
I improve presentation, Alex, if he’s finished working, has got integration with layerscape. And then, you know, your good idea goes here. So this is what the Hindu
Kush looks like.
This is compared to anything in the western U.S. It’s a really broad mountain range.
Even at 20 thousand feet all you can see in every direction are more mountains.
Anyway, thanks very much.
[applause]
Jonathan is here. He’s been waiting for this part.
[laughter]
>>: It’s not in layers.
>> Jeff Dozier: Okay.
>>: [inaudible]
>>: One question. How much computation are you consuming to generate what
you’re generating now? [inaudible]
>> Jeff Dozier: Oh. Okay. So everything we’ve done so far is in an area that’s not all
that big, the Sierra. So it is big enough that we have to worry about Earth curvature
in terms of the radiation budget. So I should have a more precise answer to that
question than I do.
You know, we have machines. We have a multi-core machine and then we have a
cluster that we’re running on. We can keep up with the dataflow. In a week we can
do about the ten-year data record. Okay?
But ->>: [inaudible]
>> Jeff Dozier: It’s, let’s see, I think there’s 64 mat lab nodes on it. I think it’s 196
nodes on the cluster, but the mat lab license is just 64.
>>: Are you the only one using it?
>> Jeff Dozier: No. No. No. Yeah, no. I’m actually waiting for some software that the
MSR mat lab license doesn’t have yet but we’ve ordered. So I plan to do some
experiments on the mat lab DCS that we’ve got in the building here. But there’s one
major software package that the mapping toolbox that we’re missing.
>>: Do you have a good sense of what the square mile to computational ->> Jeff Dozier: You know, I could probably answer that question more precisely just
doing some timing statistics. But, you know, it’s an answerable question. I just don’t
have it off the top of my -- yeah, Jonathan?
>>: If the heat from your computation melts the snow -[laughter]
You need their algorithms.
>> Jeff Dozier: The one thing I did discover in doing some timing tests is that the
laptop runs slower when it’s on battery power. And my solution to the horizon
problem, which I thought was order n, you know, I was doing some timing statistics
and it wasn’t coming out to be order N. I’m going, like, God, I’m comparing things
that I’d run here and things that I’d run at home.
Fortunately I thought about it when I came back to the office here. I said maybe I
ought to plug this in and see if it works faster. And it did. It was about 50 percent
faster running on the grid power.
Let’s see. Okay. So one thing we’ve done with this is Alex has put an interface with
Bing maps. And this I think is pretty useful in general in that this is a way to get
geotif [phonetic] images into Bing maps. And so he’s got things where you can
adjust the scale range.
If you want to wipe out some of the lower elevations, or if you want to see what’s
under this place that’s getting a lot of snow then he’s got an opacity slider so you can
see that the Truckee is getting dumped on an so forth.
And then you can switch back with another date and if you prepare ahead of time
you can cache these. So the switching back and forth runs pretty quickly. And then
if you want you can switch to one of the other methods.
>>: So what is this that we’re looking at? Is this the maximum snowfall?
>> Jeff Dozier: This is the maximal water equivalent in each of those years. Okay?
>>: So like, [inaudible] you’re just taking accumulation?
>> Jeff Dozier: Well, not only that, but it varies with the pixel. So the date at which
the maximum occurred will be different for each of the grid cells. So we’re going
through and picking out what that maximum is and then carrying that through to
have a single image.
Now, what we can do is -- this is for a single date. And so what we’ve done is we’ve
loaded lots of dates in here and again, because these dates are not cached it takes a
while for them to show up. But you know this is something that, you know, Dan and
I are going to go talk to the Bing maps products guys about.
I think the snow problem is interesting but the fact that you can take imagery that
you yourself could supply and put into this is something that, you know, even I
know in MSR you always worry about what the product guys think but this would be
a useful addition to what the tool can do.
Yeah?
>>: So is this data that you’ve collected currently just used by you and your team
and the research community? Or are you commercializing it in some way and
letting the utilities and the water districts have access [inaudible]?
>> Jeff Dozier: Well, we’re giving it away. And the reason is it’s mostly been
government funded. So we can’t really sell the forecast. But we are in discussions
with the people like the California snow survey because they would like to reduce
the tails of the error distribution.
You know, a sort of 10 percent error they can live with. But having the occasional
year when you’re way off would really be bad. And then I’ve actually got a project
with the arm called Regents Lab to try to do this in Afghanistan.
And so we’re cranking up a system at JPL to produce the MODIS images and then I
hope to run some of that here in the next month. So I know most of the interns are
going home in a week or two and they’re panicked but I’m here till the 14th of
September and also I was, I think at least, perceptive enough to panic the first week I
got here. So -[laughter]
>>: Is there any snow in the Sierras at this point?
>> Jeff Dozier: Only in some of the -- because this was a pretty light year. So
certainly in the previous year there was still snow. Well, in the previous year we
were skiing well into July. So there is snow, but it’s, you know, it’s mostly in very ->>: [inaudible]
>>: As of the survey last Thursday afternoon.
>>: [inaudible]
[laughter]
>> Jeff Dozier: Okay. But, yeah, well, there is some permanent snow in the Sierra.
>>: [inaudible]
>>: The question is because in the height of summer, even west of the Sierras there
are isolated places that never melt.
>> Jeff Dozier: Right. Yeah.
>>: So the question of permanent accumulation is something I’ve never heard
discussed.
>> Jeff Dozier: Well, it’s discussed a lot, but it’s more discussed in the Himalayan
context because the fact of the shrinking glaciers. The issue is to what extent is the
glacier melt an important component of the water supply versus the seasonal
snowmelt?
And I think, you know, I think people have thought about have a gut feel about what
the answer to that question is. I don’t know that anybody has really -- well, people
are working on that, partly using say isotope tracing to figure out where the water is
coming from.
>>: What’s up with the model that you’ve described so far would seem to not
distinguish it. It would seem to just cover every ->> Jeff Dozier: One of the things that we would have to do in the Hindu Kush is have
glaciers as a separate N member because in that case those don’t away at the end of
the season.
>>: The Himalayan altitudes you get a solid to gas transition.
>> Jeff Dozier: Well, we get it here. We get a lot of ->>: At these altitudes?
>> Jeff Dozier: Oh, yeah. Yeah, we get our instrument site we lose 20 to 30 percent
of the snow form sublimation. Yeah.
>>: Is that in the model?
>> Jeff Dozier: Yeah, yeah, yeah. That’s in the model. Yeah, we run an energy
balance model. So if you get into the forest where there’s no wind then you get very
little during the winter. But out in the open when it’s dry and windy a lot of the
snow can go away.
Where we see it at our instrument site is we have [inaudible] underneath the snow
pack. And we see periods when the amount of snow is dropping but we’re not
seeing any water at the bottom, so yeah.
And also even in the Himalaya you can get melt at temperatures below zero. Where
air temperature is below zero.
>>: From energy absorption ->> Jeff Dozier: You get energy absorption, you get it down in the snow pack where
there’s sort of a solid-state greenhouse effect. You can also freeze water when the
air temperature is above zero. Just, you know, because the surface temperature is
not necessarily the same as the air temperature.
So, for example, on very calm nights the surface can get colder than the air. And
they used to make ice in India by this. They’d lie out very shallow pans of water and
they would produce ice even on nights that the air temperature never got below
zero.
>>: Any other questions?
>> Jeff Dozier: Well, thank you all.
[applause]
>> Jeff Dozier: Yeah. So anybody who wants to come talk about this or have ideas
I’d certainly be happy to talk. We’re also I think I’ve talked with Yon [phonetic]
about how to incorporate some of this thinking into the open data open science
stuff.
And one of the issues is there an effective way to kind of put the problem out there?
And then have a mechanism for ways that people could interact, you know, kind of
like the [inaudible] against the world chest match.
Anyway, thanks all for coming.
Download