Rob DeLine: It`s my great pleasure today to introduce

>> Rob DeLine: It's my great pleasure today to introduce Steve Easterbrook from the University of Toronto. He is quite the rarity in software engineering circles. He is a software engineering professor that likes to study how software is actually made by watching it being done. So today he's going to show us how he's watched climate scientists produce software and what it looks like from their perspective. >> Steve Easterbrook: Okay. Thank you. I don't know how well you can hear me. I assume the mic is picking up something. So what I'm going to talk about today is a two-month observational study that I did last year at the U.K. Met Office, Hadley Centre, looking at scientific software and how it's built. In this case a large hunk of Fortran. So what I'll attempt to do -- whoops, I'm confusing myself now -- is try and explain why climate scientists build software, what this software is for, what is it supposed to do for them; a little bit -- I'll have to give a little bit of the background of what this software is, what's the domain and what are they simulating; and then the core of the study is how do they build software, what are their software practices, and I want to draw out some of what they do that I find interesting from a software engineering perspective; and then talk about, well, can we as software engineers help these scientists. And if the answer is no, and in many respects unexpectedly it is no, what else can we be doing and how else can we engage with climate scientists, given the importance of the field. So my motivating question going into this study was I'm a software engineering professor, these people build software largely without any software engineering training. Surely we must be able to help. We know how to build software. So that was my motivation going into the study, to look for places where software engineering tools and techniques could help these people. And going into it, we didn't know much about what the software engineering practices are amongst computational scientists. There have been a few previous studies, not very detailed studies, and there are only -- probably I can count them on one hand. And they point to things like the model is -- model is meaning the people building these large simulation models, don't have a software engineering background. Their code is very long-lived; it lives for decades and they keep tinkering with it for decades. It's highly optimized typically for high-performance machines. And these people resist software engineering tools. Their experience in the past of software engineers coming in and saying, oh, you should be using our IDs or you should be using these various different tools, they find that they're not well suited to their context and they very quickly get fed up of software engineers coming in and trying to tell them what to do. So there's this immediate suspicion. So when I first made contact with this particular group, there was this initial, yeah, what are you going to tell us that we don't already know. And when I started to explain that what I wanted to do was not tell them how to build software but learn how they currently built it, then we got over that initial hurdle and they said, yeah, we'd love you to come and see what we do. The code, of course, is nearly all written in Fortran. And there's no way they're going to change that in the foreseeable future. Fortran is highly tailored for what that need for two reasons. One is they have the experience. The entire community knows Fortran and doesn't know other languages. And the other is that Fortran is probably the best suited to what they do, which is they take the scientific formulas, convert them into code and run them on the high-performance machines. It's hard to imagine a language that's better suited to that right now. So they prefer to build their own tools rather than use anybody else's. And they have this worry that because they know any tool that they adopt they're going to be using it for decades, they have this worry that if they buy it off a vendor the vendor will disappear after a few years and it will be unsupported. And that leaves them then with a problem. So going into the study, I had this naive idea that their process is the climate scientist has some model in their head, some scientific theory in their head that they need to get into code and run a simulation to produce some results, and that's their scientific process. And that there are two key things along this path that could get in the way of their productivity. One of them is the high-performance computing question: How do we get the most juice out of our hardware, how do we get the simulations to run fast. And the other is the software engineering question: How do we build code that we can trust as quickly as possible. And one observation that's been made of computational scientists is that they're spending ever more time on the software engineering question, getting to working code. And because of that they're not able to take advantage of Moore's law, so they're not getting the improvement in scientific productivity that you'd expect from faster machines because they're being slowed down by how long it takes them to build the code. So which means there's this gradual shift in -- I mean, there's been tons and tons of work in high-performance computing, and there's this gradual shift to say actually that's not now the bottleneck; the bottleneck is the software engineering part. And we haven't spent anywhere near enough time doing that. So that's some of the background to the study. If I'm going to ask eventually about software quality, then I have to worry about a frame of reference for that question, what does quality mean to this community. And you're going to ask climate scientists about software quality, and they immediately do a mental translation in your head to the term that they usually use is model skill. So how well is the model simulating something about the real world that they're interested in. They don't talk about code quality. They don't talk about the number of bugs, defects, anything like that. They talk about skill. And to them that's what quality means. And so that means they don't ask questions about scientific productivity. You know, are our code practices slowing us down. They don't ask questions about understandability to code, can other scientists understand what we built and modify it. They don't ask questions about reliability and -- or they do ask questions about portability, because portability nearly always trips them up. Every time they upgrade to a new supercomputer, which happens every five or six years, they spend -- everything in the lab stops for six months while they port the code to the new architecture. They worry a little bit about usability, but they don't do anything about it. These models are held to configure and run. I tried it. And I couldn't do it. If we got time later on, I'll show you the user interface for model configuration there. So I want to define quality, then, as fitness for purpose, which is my favorite definition of software quality. Fitness for purpose for this community means how good is it as a scientific instrument. And the interesting thing here is quite often the utility of a model does not depend on how faithfully it captures something about the real world. So climate scientists are building simulations of the climate. But quite often what they want out of a model is not an accurate simulation of a real climate. They want the ability to ask an interesting question. So I was talking just before we started with a couple of you, one of the things they'll do is they'll take a climate model and they'll remove all the continents so we have an ocean world and they'll play with that. And quite clearly that does not represent a real-world scenario. But it's an interesting and useful scientific instrument. It allows them to ask questions that they wouldn't otherwise be able to ask. So quality doesn't always mean is it a good simulation of the world; it means is it a good tool for checking my understanding of how the physical processes work in the world. So that assumption of course is just built deeply in their culture. They know their models are wrong. They know these models are inaccurate simulations of a very, very complex physical process. And everything they do is based on that knowledge. So the other thing I should say is for the study that I did how I approached the idea of software quality -- and there are essentially four different ways of measuring quality. I think this is set out very well in Andreas's [phonetic] book. I think this is where this originally came from, although he didn't have the pictures. Most of what I focused on is process quality, because that was what was observable of these people. I went in and looked at their processes and said how well do they match what we think we know about good software processes and where do they differ, where are they doing things different from what the literature says should be a good software process. I also spend a little bit of time looking at quality and use. When they try and use the models, what happens. And I didn't spend much time on the other two. One of my grad students is doing a follow-up study running static analyzers over some of these climate codes and trying to correlate observations of software defects with what we've seen in the process quality and the quality and use. And that's a fascinating follow-up study. I'll mention that briefly at the end. So I started my study with five initial questions. What do they understand by correctness, which then boils down to how do they know they can trust their code. How do they ensure reproducibility, because if they're running scientific experiments on this code, how do they repeat an experiment. How does a large community of scientists engage in building these codes develop a shared understanding such that they can coordinate their activities. How do they prioritize the work, how do they figure out what to do next. And how do they debug the code. So those are my guiding questions. But I should also point out I approach the study as an ethnography, as me going in as a stranger, as someone that's not familiar with this domain, looking for things that were surprising to me. And many of the things I looked at were not surprising to them at all. They said, well, of course that's what we do. So one of my guiding principles was surprise to me as an outsider. And so many of the things that I'll remark upon were things that I found strange. And other people might not find them strange. So there's a lot of my bias in there, as you'd expect in an ethnographic study. So before I show you what I saw happening, I should just explain what this software is that these people build. So this is a little detour, which could be arbitrary long, depending on how interested you are in it, of what this code is. Sorry, I shouldn't go there quite yet. So, first of all, it's very important to understand that on knowledge of climate change, the basic knowledge of climate change does not derive from these models they build. It derives from the basic physical properties of greenhouse gases which were all known in the 1800s, so all the basic properties of greenhouse gases were worked out experimentally in the 1800s. The first calculation of climate sensitivity, which in physical terms it boils down to the question of if you double the concentration of greenhouse gases in the atmosphere how much temperature rise do you get. That was all worked out in the 1890s using pencil and paper from the basic physical equations. Okay. So it was all worked out from first principle. So we knew -- and the number that they got back then, which was about 3 degrees centigrade, is consistent with what the very latest IPCC forecasts say, it's within the range of the error. So that's been known for well over a hundred years. So our basic understanding of what the burning of fossil fuels and emission of greenhouse gases does to the climate does not depend upon these simulations. Okay. So why do we need models? Well, why we need models, then, is to pin down our understanding to some extent of the consequences and to make sure that we do understand the physical processes, in particular we understand what happens at different time scales. So although I can calculate what the temperature rise is for, say, a doubling of CO2, I don't know what time frame that will occur, because there's a lag. So how long is the lag. How long will it take. Looking at long-term tends and looking at the regional impacts, because that 3 degrees temperature rise does not happen universally across the globe. So where was it happen. The poles experience about twice as much temperature differential as equatorial zones. So the temperature changes happen differentially across the globe. So that's what we can use the models to understand. And to separate out different causes, to separate out a number of different forcings that are changing the climate and start to figure out which ones of them are having the biggest effects. And one of the things you can do in the model is turn off things. In the worst case you turn off the sun and see what happens. But you can turn off other things. You can turn off humans. You can turn off human emissions. You can turn off all sorts of things in the model and look at the differences and just play with them. Of course what the policymakers want are these last two things. They want to understand strategies for mitigation, for reducing common emissions, and adaptation to changing climates. You know, where are we going to get flooding, where are we going to get sea-level rise, what infrastructure changes are needed, and what policies should we put in place. And now we've got a huge tension because the climate scientists are comfortable with the stuff at the top; the policymakers are demanding the stuff at the bottom. And the scientists would rather not run their models in a predictive mode. They would prefer never to have to project forward and say here's what's going to happen over the next century. They would prefer to play around with poking and prodding their models within observation datasets that we have from the past and checking their understanding. So to them the model is to check our understanding of some physical process. It's not to make predictions for the future. Because making a prediction for the future, even 20 years out, to a scientist is useless. You'd have to wait 20 years to find out if you were right, and by then it's not a publishable result anymore. So those kinds of predictions aren't what they do. And hanging out at the lab and listening to them talking, they were just getting ready for the next round of IPCC reporting where IPCC sets a whole bunch of simulation runs that they would like to have to put together the report. And the scientists are sitting there saying how can we get these done as quickly as possible so we can get back to doing science. And that's their philosophy: We want to get back to doing the science. We don't want to hand the policymakers the stuff they want. And if we do, let's make that as quick and simple as possible. So here's the core idea of what a climate model is. This is a very, very simple model. You just add up energy as it moves around the planet. So what's the total energy incoming from the sun. What gets reflected by various surfaces, reflected by the surface, reflected by the clouds. What gets absorbed by gases, when does that get released again, where does it get released. So it's how does energy move around in the system. And when I said there the initial numbers for climate sensitivity were worked out in the 1890s, this was basically the equations they were playing with, what's the energy balance of the earth and what's the new temperature that you have to get to to make sure all the energies are in balance if you change the composition of the atmosphere. Now, of course that's a very, very simplistic model. And what you really want is a whole bunch of other feedbacks in there. For example, feedbacks where if you melt the ice at the poles that changes the albedo, that makes -- that replaces white ice, which reflects sunlight, with dark seawater, which absorbs sunlight. So, well, how does that play into the system. If you melt the permafrost, that releases methane. How does that -- and methane is a very potent greenhouse gas. How does that play into the system. If we change land juices, if we start planting new trees everywhere, how does that play into the system. If we cut down trees and so on. How do volcanos, volcanic eruptions perturb the system. So these are the stuff that the scientists want to play with. They want to play with all these interesting feedbacks, put those into the simulations and see what happens. So the core of the model is the earth dividing up into the cubes. So they take the atmosphere and divide it into a huge number of cubes and take the ocean and divide it into a number of cubes. And for each cube at each time step in the simulation you solve the equations of fluid motion. And that's it. That's basically what the climate model is. Now, of course, then you have to put all sorts of interesting boundary effects in there. You have to tell this model where the land masses are, where the mountains are, where there are different kinds of vegetation, where there are sources of greenhouse gases. And so there are all sorts of parameters that you have to add to the model to make a more and more realistic simulation of the earth. But the core essentially is this huge, big computation of fluid flow. And here's an interesting observation. In the last 20 or so years, there has been essentially no change in how long it takes to do a run of a climate simulation. And yet -well, wait a minute, wasn't the Moore's law -- don't we get a doubling of processing power every -- whatever it is, 18 months, which is a dramatic improvement in processing power every 20 years, and yet it still takes the same amount of time to do a climate run. What happens is every time they get a faster machine they just increase the resolution. They increase the resolution of the grid. So the driving constraint over how long a climate simulation takes -- there's only two constraints, really. It's how patient a scientist is, is willing to wait for a result, and how much time can they get on the local supercomputer to run their experiment. And that's it. And the more time that they have available, well, they'll just up the resolution of the model because resolution is everything to this community. So this gives you some sense for some recent models of the size of the grid squares. So HadCM3, which was the Met Office's main model about seven or eight years ago and the one that went into the last round of IPCC reports, had grid squares of -- I've forgotten what those are -- 270 kilometers on a side and 19 levels in the atmosphere. HadGEM1, which is its replacement, has -- where are we -- 38 levels in the atmosphere and grid squares 135 kilometers. There's a newer generation of models further than that that goes up to 78 levels in the atmosphere and the grids squares are getting smaller and smaller. But that's the kind of resolutions they're playing with. So you take your climate model, you stick it on the supercomputers, and you set them running. And it will take typically -- I've got some numbers here -- about 30 minutes of CPU time to simulate one day of climate. Which means if you multiply that up a century, so say you're interested in simulating the entire 20th century, that's about 50 days that you have to run that simulation for on the -you know, some of the fastest machines in the world. And here's what you get. So this is what I was showing at the beginning, and I'm just going to quickly -- for those that weren't here at the beginning, just show this run. Because I find it beautiful and fascinating. This is a visualization of one month in August of climate showing basically precipitation. So this just shows you where it was raining. So anywhere that's white is light rain and where it's orange is heavy rain. And the importance of sharing this is to understand that what falls out of these models -- I said they're basically solving fluid flow -- you see real weather patterns emerging. You see tropical cyclones in Japan. You see the North Atlantic current glowing the rain onto the U.K. You see the Indian monsoon occurring. And none of that is programmed into the model. This is all emergent properties of those basic physical equations for how heat and energy and water and mass are transported around the plant according to where the land masses are, what gravity does, and so on. As soon as you see this, you say -- this is what convinces me that these climate models are incredible scientific instruments, because they can simulate -- this isn't real observation of data, but it behaves in the same way that the real climate system behaves such that when we play with the model we can believe that this is as good as it gets to doing real experiments on a planet-wide scale, which, of course, is what we can't do. So let me stop that and carry on. So climate is a very complex system. There's all sorts of sources of uncertainty. There's measurement error. Our observation of data that we're testing the models against has errors in it. There's variability in the physical processes. So although we're studying, for example, global warming, superimposed on that warming are all sorts of other cycles that completely swamp the warming. So the short-term cycles, the annual and decadal cycles are much stronger than the warming signal. So you've got to kind of pick out the signal from the noise. And then there are of course model imperfections. The models are imperfect. You cannot simulate everything that you'd want to simulate and have it run in a reasonable amount of time. So there's all sorts of imperfections in the models. I'm just giving you an example of some of the tradeoffs they have to make. There are a huge number of physical processes that we might want to put into a simulation. And this graph shows you some of them, things -- and of course they have happen on different spatial scales. So this is a logarithmic scale from the millimeters up to -- where are we -hundreds of thousands of kilometers. And from microseconds up to tens of thousands of years. So, for example, things like surface gravity waves, turbulent mixing and so on, cloud formation, we're interested in fine grains of scales. Climate changes on a completely different scale. El Nino, seasonal cycles and so on. So if you're going to do a particular run for a particular scientific experiment, you've got to decide which of these things matter and therefore which physical processes you want to put into the model and which you want to leave out. Because you can't ever put them all in. So they're continually making these engineering tradeoffs for each experiment. Which things do I want in the model for this experiment and which don't I want. Knowing that everything that you leave out effectively means the model is less perfect. Okay. So that's the context. Let's look at how they build the software. So the U.K. Met Office is one of the world's leading centers for climate modeling. There are about 25 labs around the world that build these simulations. Each one has their own model. And, by the way, all the models are basically built at government labs. They're not built in universities. They're built at government labs by government scientists. Because universities just don't have the resources to do this. So at the Met Office they have a shared code base with lots of different models. They call it the unified model. It's a huge hunk of Fortran. It peaked last year at about a million lines, a million lines of code. I've got some pictures later on to show you how it's grown. And that unified code base is used to build both weather forecasting models -NWP is numerical weather preaddition. So the Met Office in the U.K. is an operational weather forecasting center. In fact, they provide weather forecasting services for about half the planet, for civil aviation, for military operations, for the media, for all sorts of commercial outlets. They are an operational weather forecasting center. And out of the same code base they build the climate models. So a lot of the core routines, a lot of the numerical routines are the same in a weather forecasting model and the climate model. What's different is the scales at which they're looking at things. They have this very -- I call it a hybrid development process. It's very much bottom up. The scientists themselves decide what's important to work on and what they want to add to the model. But they also have some top-down management priorities to say there are some scientific priorities, there are from the operational forecasting side of the house certain business goals they're trying to satisfy, and those are superimposed upon this community of scientists who are sitting there saying here's what I want to do with the code. And nearly everything they build is in house except they're starting to engage a little bit in the idea of a community model. So this is a community of scientists across the world, or in some cases across a particular country, who are working together to build a particular module. So one example of that, for example, is the U.K. Atmospheric Chemistry Group who have built a model of atmospheric chemistry, and they're trying to incorporate that into the Met Office's model. So that's code from outside the lab being incorporated. So the Met Office has about 50 different software development projects in house at the moment. Of those, the unified model, which is their core simulation model, that's the biggest project they have going. It's currently -- I said it peaked at about a million. It's down to about 850 source lines of code. Of that, nearly a third has changed in the last two years. So the amount of code churn here is phenomenal. And on the climate side of the house, there's about 170 scientists; on the weather prediction side of the house there's about 300 scientists, scientists with background in meteorology, numerical analysis, climatology, and atmospheric chemistry, oceanography, a few other things. And supported by small teams of IT specialists. And I'll talk about their role in a few minutes. All of the code is built by the scientists. So virtually every line of code originated from one of those scientists. With each release -- they do a new release of the model about every four months, each release has about a hundred people who have contributed code to that particular release. So here's my graph of the code growth. The green line is lines of code -- lines of Fortran on this scale here. So you can see a million at the top there, and that's where it peaked with version 6.6. The blue line is the number of files on the right-hand scale, so about 3,000 modules currently. And what's interesting about this curve here to me is -- I said I was looking for things that surprise me. It's approximately linear over a 15-year period. I can't get records back older than 15 years. I trolled through their archives. And so over a 15-year period they've seen an approximately linear steady growth of that code base. With -- there's two little perturbations here, and those are easily explained. You can ask them what happened. Here they stripped out the dynamical core of the model, the basic core. Numerical methods were considered old fashioned. They stripped them out and replaced them with a new core. And of course that basically stopped all other work in the lab for nearly a year while they got those working. And in hindsight they say we tried to change too many things at once. And so when we're up again for replacing the core, we don't do it all at once again. So that was that glitch there. It took them till version 5.1 to get everything working again. And this little glitch here, they stripped out the old ocean model. They were using an ocean model from GFDL in Princeton. And they threw that away and use add new ocean model from a group in Paris. And the new ocean model is a lot more compact. So you get a drop in code there. So apart from those two glitches, it's a linear growth over a long period of time. What's driving that change -- I said it was very bottom up. Here's what happens. There is -- the three colored blobs are basically the three drivers of change to this code base. So one is physics research. So new insights, new data, new papers published about the physical properties that they're trying to simulate in the model. So they're incorporating new research into the model. And the other two blobs are day after day after day after day running the models, comparing them with observational datasets, comparing them with other people's models, and feeding back improvements into the models. So both the weather prediction side of the house and the climate modeling side of the house are doing this every day. And, in fact, that's where most of the changes come from. They come from this continually running the model, playing with it, and trying to improve it. I've drawn these smaller. There's an occasional attempt to clean up the code. Actually, it's not to scale. It should be a tiny little cloud. You probably wouldn't even see it on my slide if I drew this to scale. There is very little code cleanup that goes on. And there's some good reasons for that. One is -- I talked about reproducibility, the ability to reproduce an experiment. If you clean up the code and you mess around with exactly what phenomena emerged from the model when you run it over a long period of time, you've broken the ability to run an old experiment again. They have this constraint -- and I'll talk about it a little bit more later -- known as bit reproducibility. So every change that you make to the model should preserve bit reproducibility. What that means is you run the old model, you run the new model, and the outputs should be identical down to the most significant bit. And of course these are all real numbered variables we're playing with. So down to the most significant bit in a double-precision real number. The simulations have to be identical. If you break that, that's a big thing. There's this list on their Web site of changes that break bit reproducibility. And they'll spend ages tracking down them to say can we get rid of that. Now, there's two reasons why bit reproducibility matters. One is because you want to rerun an experiment and get exactly the same results again. But the other is that bit reproducibility gives them a free ride when it comes to testing. So if you think -- if it takes a month to run a century's worth of climate simulation, that means testing this thing is hell. You have to wait a month for your test. So what they do instead is they'll run it for let's say one day of simulation or even one hour of simulation. And if down at the very least significant bit it's identical, that's a good indicator that if they let it run for the whole century, they'd get the same result. It's not guaranteed, but it's a damn good proxy for letting it run for the whole month. And so they can run a very large number of experiments -- overnight unit tests, for example -- test bit reproducibility, and if none of the bit reproducibility tests are broken, that's a good indication that nothing did get broken. You have a question? >>: Yes, if you don't mind. >> Steve Easterbrook: Yeah. >>: So when it comes to something like testing, when you've been saying "the model" -but there isn't one model, right? There's a huge family of related models depending on what parameters you use or whatever. So how do they know that even if they preserved one they haven't broken others? >> Steve Easterbrook: Okay. I'll answer that I think in the next few slides. And if I don't, ask the question again. Because it's important. So some of the requirements for this model conflict. For example, from the weather forecasting side of the house, the simulations must be very fast and give accurate forecasts, and in fact they have a very precise set of metrics for forecast accuracy, and a business goal of improving accuracy by a certain percentage every year. So accurate forecasts are the most important thing for those people. For the climate scientists, scientific validity is more important. They will accept a run that's less faithful to the observational dataset as long as it's better physics. So of course what you get is the weather forecasting people are always tweaking the model. They're always trying to tune it to give very, very good simulations of current weather situations. The climate modelers would prefer it didn't get tuned quite so much. They don't care so much about faithfulness; they care about is the physics in the model defensible, is this our understanding of what's happening. There's also tensions between components in the model that have different origins. So the stuff that's developed in house, which is most of the code, is very tightly controlled. The stuff that's contributed from outside is a real problem for them. It can take a good six months' worth of work by several people to take an external chunk of code and integrate it with this model, get it to work on their in-house platform. So naturalizing external code is a big problem for them. And they would prefer that didn't happen, but that's the way it is. So the basic toolset is everything is -- all the code is controlled through Subversion. Everybody works on a branch. They use Trac as their bug tracker. They've got graphical code diff that they use quite a lot for playing around with differences. They've built a custom user interface to Subversion and Trac to greatly simplify. They don't have to know really much about how Subversion works. There's only two or three Subversion commands they ever use, and they've simplified the interface to those. So the scientists don't have to understand Subversion that well. They've got a custom build and a custom code extract system that when -- want to build a particular operational model it goes and gets all the physical schemes that have been chosen into that model, does the build automatically, sends it to the supercomputer and starts the run. So all of that is at a push of a button. So coordinating these big teams is a big challenge. So they're all working on branches of Subversion. So here's how it works essentially. Each of the operational models, and there's about eight or nine operational models in house built from this code base, each is a separate branch in the Subversion repository. Each has its own release -- no, sorry, I take that back. Yeah. Sorry. Each is on a branch in the Subversion repository and has its own release schedule. And the core code base itself, the unified model, has its own release schedule. You saw on my graph of codes each of the dotted lines was an official release. One of the big problems they have is how often should they update their branch from other changes, how often should they incorporate changes that other people are doing into the branch I'm working on. If you do it too often, then, you know, you have to stop what you're doing and spend time getting everything working again. If you use it too rarely, you get too big an increment and everything breaks. And just being aware of what changes are happening elsewhere in the lab. Across a team of several hundred scientists, it's impossible to know what other people are working on, and yet they desperately need to know. So they have a very heavy reliance on informal communications. So they solve most of their problems just by knowing who to go and talk to. And the interesting thing is these -- so there's several hundred scientists. They're essentially in one big, open plant office. They occupy the entire floor of the building. And you can get pretty much from any desk to any other without passing through any door or staircase. It is a huge, big, open plant office. And they all say that's important to them. About five years ago, six years ago, they moved to a new site in a different town. And previously there were -- the numerical weather prediction and the climate research were in different buildings with a car park in between. And they always say, oh, it's much better now; we don't have to walk across the car park. So that's clearly important to them. Oh, yeah. So I mentioned that. They use extensive use of wikis and newsgroups within the lab to keep informed of what's going on. Lots of e-mail. The other thing they do a lot is they form temporary cross-functional teams. So they've got a particular phenomena going on in the model, and the one that I saw when I was there is a big problem with the Indian monsoon is too dry. So they put together a team made up of different scientists from different disciplines across the lab who will get together regularly over a six-month period and figure out what they need to do to fix this. And this is one of the ways that they learn who else is in the lab and what they know. By participating in these teams on a regular basis, they all get to know who else is there and what other people know. So that's how they maintain the social network. They also are organized a little bit like a typical open source project with -- you've seen studies of open source, they'll talk about the onion model where you have a project leader and then you have a core set of members who decide what goes into a release, and then larger groups of code contributors who are less heavily engaged in the team are the people who are just users, people who report bugs and so on. So there's impassive users. So there's a core set of people here who control the project. And you don't get to be in the core until you've proven yourself. It's a meritocracy. So the core changes relatively slowly and people are only accepted into the core of an open source community when they've proven that they can -- they've got what it takes. Well, that's how these guys organize themselves. They've got -- at the core they have those systems teams. And the systems teams look after the releases. So they'll plan a release on about a four-month schedule. And they'll take code contributions from scientists anywhere across the lab to say you can submit your change to this upcoming release. But it has to be ready by a certain date and there's a cutoff about a month before the expected release where they freeze the code. Actually they talk about a frosting rather than freezing, because they still allow some changes. And from that point on the systems team here work like crazy taking no more than three changes each day, incorporating them into the trunk, running the overnight test harness, and if everything worked, then they move onto the next set of changes for the next day. And then outside of them are a set of code owners. So there's about 20 senior scientists across the lab, each one assigned a particular chunk of code that they are the expert and they have to approve all changes to that chunk of code. And they have two stages of review. Before the change is actually made, particularly for the large change, they have to prove this is even a reasonable thing to be working on. And then once it's ready, they have to approve that it's now okay to accept into an upcoming release. They also have this -- I've drawn it kind of slim. They have this set of configuration managers, one per operational model. So this is typically a scientist who ends up spending about 50 percent of their time not doing science but worrying about how changes are affecting one of those operational models. So this is the answer to your question. They have these designated people whose job it is to look after a particular configuration of the model. And then they have, as I said, in any particular release about a hundred people who are contributing code to that release. A large number of scientists, end users. And I didn't draw it. There's another ring here. Another group of people who are preparing changes for future releases. So just playing around with stuff. If you look at the repository, you can see exactly what I described here. These are the top -- where are we -- however many changes to the trunk of the UM over some period of time. I can't remember what the time period is. So there's only 14 people that contributed code to the trunk. And here, notice this. If you know what the usernames are, the top one, two, three, four, five, six, seven, all their usernames start with FR. That's the systems team for the weather prediction side of the house. The four usernames starting here with the "had" are the systems team for the climate modeling side of the house. Basically the only people putting code into the trunk are the systems team. But they didn't write that code. That code all came from scientists who were contributing code for upcoming releases. And notice that the weather prediction people completely dominate the climate change people in terms of numbers of code and amount of code that's being changed. So how do they do V and V in this environment? Lots of informal desk checking. They don't have any formal unit testing process. There's no requirement to do unit testing, and most of them just don't bother. There are these two stages of review that I mentioned. But here's where nearly all the V and V comes. It's a continuous testing set up as science experiments. I talked about bit reproducibility, and I talked about the automated overnight test harness on the main trunk. So this one here is the one I haven't talked about yet. And this is the one that I find deeply fascinating and unusual. So here's what you do if you're making a change to your branch. You set your change up as a science experiment. You've got a hypothesis that you're testing. If I change this little bit of code here, I think I can make an improvement to the model that will have the following effect. And then you test that hypothesis by running the new code, using an old run of the model just before you changed it as your control, so now I've got a scientific experiment, went to the two treatments, the control and the new code, and your observational data as your measurement of how well you did. And you push a button. When you submit the code, you push a button and it generates all these visualizations of the results. So what you see here is for a different model parameter. And, by the way, this is just the top portion of a wiki page that goes on. There's about 30 different parameters that they've picked out here that they want to see visualizations of. And each one is a four-up display. And I should explain what the four-up display is. So you see four different visualizations of this one model variable. The first is -- so in this case what they're doing is they're experimenting with a new polar filter over the Antarctic. So something that's supposed to improve sea surface temperatures over the Antarctic. So this is PMSL, which is mean pressure at sea level, for DJF, December, January, and February. So the winter in the northern hemisphere, winter quarter of the year. That's the raw result from the new model, from the changed code. This is the difference. This is the delta between the control, the old version of the code and the new version of the code. So this is now where in the world did we make a difference. And the differences are where they're expected to be. They're around the Antarctic. The biggest difference is there. But there's this whole band. The differences are where they're supposed to be. That's good. This one here is the control minus the observational dataset. So this is how well was the old model simulating on observed data for this period. So these are anomalies. This is where -- anywhere that's darkly shaded here, this is where the model was doing badly before. This is the new code minus the observational dataset. So this is how badly the new model doing. And there's less dark blue. And there's less dark blue where there's supposed to be less dark blue. So this was a successful experiment. By the way, success here is done by eyeballing these graphs. That's the criteria. So they look at them and they stare at them and they pass their -- I sat in these meetings where they've walked in with their latest pictures like this and they sit down and there's not a word of explanation, four or five people in the room handed these graphs. This is the first time they're seeing them. And they say, oh, yeah, you did it, you got what you were looking for. And they just go straight in. Nobody has to explain what they're seeing. Okay. Because these visualizations are their common currency. This is what they spend all their time staring at. They also -- some of them. This is not a universal practice across the lab. But some of them use their wikis as electronic lab notebooks. So this is one of those configuration managers who's keeping track of every experiment. So every five-letter acronym here is an experiment. It was a change to the code where one of those experiments was run comparing the old version to the new. And in the middle here he's written down a brief, few-word summary of what the change was. And notice that he's got several different models he's trying it in. So N96L38 is a particular resolution of the model with 38 levels in the atmosphere, and I forget what 96 means in terms of horizontal resolution. But a particular horizontal resolution. N96L70 is a higher resolution model. And N144L38 is another different resolution. So some of his experiments are in one resolution model, some are in another. And one of the things he'll do is he'll try it out in one model; if that works, he'll try it out in a different model. So, again, in answer to your question, he's going around systematically looking does the change work, first of all, in one model, and then how does it affect models at different resolutions with different physical properties. I think everything in red here was an experiment that failed. I better go back and check this. I think everything that's in black was a successful experiment; everything that was in red it failed, it didn't do what it was supposed to do. And the one that's in green I haven't figured out. I don't know what that is. Anyway... And then single one of those experiments -- of course the reason they're blue is that they're hyperlinks in the wiki -- leads to a lab notebook for that experiment: who did it, what it was, what it was supposed to show, links to the visualizations of the output, brief description of the results and so on. So he's got an electronic record of every experiment. Here's another visualization that they commonly use for getting a bigger picture of how well they're doing as they're improving the model. So this graph here summarizes -- there's about 30-odd -- more than 30 core indicators of model skill, each one typically expressed as a root mean squared error over an observational dataset. So when you get down to zero, you match the observation data perfectly. And what they've done is they've taken all of those variables and normalized them so that for each variable one is where the old model was. So that line there is -- this is what the old model did. The colored dots are what the new model did. So you've got a one-shot visualization of where are we getting worse and where are we getting better. If you're above the line, you're doing worse; if you're below the line, you're doing better. And the whiskers are the error range in the observational data. So if you're within the whisker, you're now within your target of where you want to be. So everything in the new model that's within the observational whisker is green. That's good. We got where we want to be for those variables. Everything that's red is a variable we're doing worse on in the old model. Everything that's amber is where we're doing better than the old model but we still haven't met our target range against the observational data. They also spend a lot of time doing model intercomparisons. So comparing their models on standard scenarios with other people around the world. And there's a huge amount of effort that goes into this. They'll also do model ensembles. They'll take large numbers of different models and run them over and over and over again to do probabilistic forecasts. And that's how a lot of the forecasting, when they do forecasting, that's how it's done. So let me try and -- that's what goes on in this lab. Let me try and summarize some of what's happening. First of all, an observation that actually I found surprising after trolling through the literature, this approximately linear growth in their code is unusual over such a sustained period of time. If you go and look in the literature of people who have done these long-term studies of code growth, and the classic ones were all done by Lehman on commercial systems, he points out that all the commercial systems he studied have this approximately inverse square curve. As the code grows -- as it grows, as it gets bigger, growth tails off because of the growth of complexity. It just gets harder and harder to change the model as it gets bigger or change the code as it gets bigger. So somehow these scientists have escaped from that trap. Studies of open source, this is from Mike Godfrey's study of the Linux kernel, do tend to have this linear growth. In fact, he showed that the kernel itself is approximately linear over some long period of time. And it's super linear if you take into account all the device drivers that are being added. Well, let's leave out device drivers. That just complicates the picture. It's approximately linear if you just look at the kernel itself. Yeah. >>: [inaudible] how many projects in the world would you say there are with more than a million lines of code? Is this common now or are there like ten in the whole world that actually are that big? >> Steve Easterbrook: That's a great question. And I don't know. Anybody else know? >>: I'm just trying to say, though, that this project you're talking about in the context of [inaudible]. >> Steve Easterbrook: Yeah, okay. I know it's big, but I don't know how many other projects it compares to. That's a very good question. Right. So one observation here is that it appears to have escaped the trap that lots of commercial code falls into, that growth tails off after a while because of the complexity. And so now I have to explain that. If it's more like open source [inaudible] what's common about open source projects -- some open source projects and this scientific code that's different from commercial systems. And my best hypothesis for that right now is the domain experts are writing the code. And it might just be as simple as that. In most commercial systems, the domain [inaudible] building financial software. The domain experts don't write code. They have to explain to the programmers what's needed. And then there's this big communication gap. So one hypothesis is that in a lot of open source and in a lot of the scientific code you don't get that communication gap, so you escape the complexity trap. I don't know. It's a hypothesis that needs testing. We have nowhere near enough data on code growth in different types of projects. So there just aren't enough data points to be sure even the phenomena that Lehman pointed out is really true. What about defect rates? I attempted to do a back-of-the-envelope calculation of the defect density comparing to some stuff that's been described in the literature. So NASA space shuttle is usually held up in the literature as the best ever and the most expensive per line of code every built in the world. And they report about .1 failure per thousand lines of code post release. Well, in the unified model, let's say I take the last six releases. Over the last six releases, the average is about 24, and it actually is a very small variability. It's somewhere between 20 and 30 bugs per release with an average of 24. And an average of 50,000 lines edited per release. So what does that mean? That means about two defects per thousand lines of code are making it through their release process undetected. Or if I expand that to an expected defect density of the entire code base, I get a number 0.03 faults per thousand lines of code in that code base right now. Now, of course, this depends on what you count as a bug. You know, how did I count defects. Well, I counted what was reported as errors in their bug tracking system. Do they record all errors in bug tracking systems the same way that NASA would or military systems would or Microsoft would. Okay. I've got my grad student doing a follow-up study on this to try and get better numbers, because, first of all, we're not sure if this is believable. If it is believable, it's remarkable. So, first of all, we better check these number actually are even within orders of magnitude anywhere near accurate. And they appear to be. And so the next question is, well, how do I explain that, how do I explain this relatively low defect density compared to other types of software. And, you know, we played around with, well, why don't they seem to have many errors. Well, in large-scale numerical simulations, most of the coding errors that you could make will be instantly obvious for a certain number of reasons. First of all, the model just won't run. Or, secondly, it does run but it crashes pretty soon because some variable's just gone out of range and the simulation's just gone haywire. And you only have to run it once to spot that. And of course they're running every change lots and lots of times. And then they've got these bit comparison tests to make sure that you didn't break anything. So it's a very, very conservative change process that doesn't let many bugs into the released code. And then we spend a little bit of time probing some of the bugs that did make it into the release code to find out what happened. And let me tell you one story which I found fascinating. There was an error that had been in the code for a couple of years in the released versions of the UM before it was fixed. And it was an error in the soil hydrology module, so how much moisture is there in the soil, which of course affects how much moisture is passed into plants which affects the evaporation of moisture into the atmosphere which affects where the water is in the system. The reason the error was in the code was because the routine in this soil hydrology module was taken from a published paper that had done a study of soil hydrology, and they'd got some parameters from this published paper and a couple of equations from the published paper in this code. And they'd mistakenly taken a logarithm in the paper and assumed it was a natural logarithm when it was actually a logda base 10 [phonetic]. So they were off by -- what's the conversion factor -- 2 point something in this equation. And so the soil was -- I don't even remember which way it is now. It's too dry. And they'd been aware that they were having problems with moisture in the model around the soil implantive operation for a period of time. So they kind of knew that something wasn't right, and they just didn't have time to fix it. It wasn't causing a big enough change in the moisture elsewhere in the model that it became urgent to fix. So for two years they kind of new something was wrong, but they didn't know what. And they just tuned that out. They said, well, we'll just add some extra tuning into the model to compensate for that problem. We know it's there. One day we'll get around to fixing it. But we can run over all our science experiments without fixing it because we've got all sorts of other inaccuracies in the model and so it doesn't matter. And then one day a scientist in the lab had a bit of spare time. He said I'm going to track that bug down, we're going to find out what happened. And the way he tracked it down was he went and got five or six other models from other labs, looked at their soil hydrology modules, ran their simulations and compared those to the Met Office's and discovered that in five out of the six models that he found they all agreed with the Met Office model, but one model was noticeably different. And so he then tried to track down where the difference was, and he pinned it down to this particular equation in the model. And lo and behold the one that was different had got the right logarithm in the model and the other five had all got the wrong logarithm in the model. And, you know, they'd all talked about this in shared code and shared -- and so they had propagated across all these different models. So they found it by comparing. So the fact that there are a whole bunch of other people around the world building the same software for essentially the same purpose gives them a huge advantage. It's not clear -- I chatted to this guy at length about how he found the bug. And it's not clear he would've ever found it if he didn't have the ability to go and grab other people's models and compare them. Okay. So that allows me to -- let me start to wrap up and draw some conclusions. I -- having spent this time at this lab and looked at what they do, I characterize it as an extremely successful software development outfit. They build very high-quality code. They are very careful about which tools they use, but they know that the tools that they use are essential and they're resistant to trying out other stuff just because it's fashionable. And they've managed to, you know, have this linear growth in code. There's no tail-off in the growth of the functionality of their code over a very long period of time. So what matters. It matters that they have a highly tailored software development process. They don't do what the textbooks say you should do; they do what works for them. And over a very large number of years, a large group of very smart -- they're all physics Ph.D.s. They're smart people. And they spend time thinking about how to make improvements to their process. So over many, many years they've tinkered with the process and they've evolved something that's highly adapted to their environment, their context; not by listening to the literature, but by playing around with stuff. And if it works, they adopt it; and if it doesn't work, they don't use it. As I said, the code developers are domain experts. And that appears to be crucial. They have this very strong sense of shared ownership. So they're like a small software startup in many respects, like an agile software development team. They all own this code. And they all feel a very strong sense of ownership and a very strong sense that we're all responsible for making sure it works. So they have also that idea from open source of many eyes validation. There's many people looking at this code and worrying about it. So the chances are people will look at the code and find the errors. So they also have this openness. You know, the code is -it's not officially open source, it's not given away freely. You have to sign a license with them if you want it. They'll give it to any other research outfit for free if they sign the license. And but pretty much the only people that use it are in house. There are a small number of communities outside the Met Office that run this model, but the biggest group of users are in house. They do a lot of benchmarking work, these model-into-comparison projects; that they spend a lot of time running simulations over everybody's different models and comparing them. And -- oh, one thing I didn't mention is they don't have a fixed release schedule. So although I said they plan a four-month release cycle, they don't decide a release date until they're ready. So they have a target in mind. And the systems team are running their overnight tests every day until they've got every change folded in. But they don't announce a release date pretty much till the day of the release. They say today we're done. We've got all the changes folded in, nothing broke, we can announce a release. And they won't release until it is ready. So -- because there's no external customers using this, they can do that. It doesn't matter when the release happens. The release matters for stability, but it doesn't matter because there's no one waiting there to buy it or to upgrade or anything. And people are relatively slow to upgrade to newer versions of the model. So highly adapted processes. They use all sorts of bits of agile practice, or things that I would call agile practice. They don't use this terminology at all. So the ones that are checked in green here are things that I think they use. The ones that are in red are the ones that they don't appear to do anything like this. And the ones in yellow are ones where they couldn't decide. They kind of sort of do it but, you know, it's not clear. So it's interesting. They've picked and chosen bits of agile practice that work for them. But they certainly don't do it dear to any standard agile model. They have a very strong sense of a shared conceptual architecture. They all know their way around the code. They all know which bit of the code corresponds to which physical process. And they've all got pictures like this in their heads of what the main units are and how they interact. And then there's always three interesting comparisons with open source. This release schedule that's not constrained by commercial pressures, developers being domain experts is, again, common of the best open source projects. A core group of code owners who very tightly control what gets accepted into the trunk. A community that operates as a meritocracy. So the people that are best able to code do the coding; the people that are best able to look at the scientific direction of the model look after the scientific direction of the model and so on. And those groups are relatively stable and change very rarely. Oh, and here's my favorite observation. None of the people that write the code for this model think of themselves as programmers. They're not programmers. They all are scientists doing scientific research. The only reason they ever build code is because they need it for the research. So it's like an open source developer who has a day job but happens to tinker occasionally with tools because they want the tools to do whatever they need in their day job. So they're not programmers, they're scientists who just occasionally have to write code. And then the verification and validation is based on extensive use by the developers themselves. Challenges. Let me pick out a few things where they are having serious problem. I mentioned coordination. They do have a big problem with coordinating changes across all the different branches in the lab and just knowing what else is going on and knowing who else is making a change that somebody ought to fold in with something that they're doing. They really want to get into multisite development. The multisite development thing is important because these models are now getting so complex with so many different scientific modules from different disciplines. Like as soon as you put in plant biology and soil hydrology and oceanography, and you can't have all that expertise in one lab anymore. You used to be able to. They were getting to the point where they just can't. So they want to have multisite development where some of the modules are built elsewhere where there's expertise and imported into this model. And every time they've done that, they've got into huge difficulty. There was a big fight going on in the lab when I was there last summer over the new ocean model. So the new ocean model is built by a group in Paris. And one of the things they did with the old ocean model, which came from Princeton about 15 years ago, was basically when they took that old ocean model and put it into the UM, they did a code fork. They basically had to start making their own changes to make that ocean model work in their code. And of course as soon as they did that, they forked from the old ocean model. Which means they got a state-of-the-art ocean model, but they lost the connection with the original team so they couldn't get all the updates to it as it got steadily better. So they've now got an ocean model that just wasn't keeping up with the science. So when they went to NEMO, which is this new model from Paris, they said we're not going to do that, we're not going to fork. We're going to write an agreement with the Paris folks that all the changes that we need to make the ocean model work in our model will be folded into the baseline in Paris and looked after by the team in Paris. And so now they're going to avoid forking, but they're buying themselves a whole bunch of other coordination problems, because the team in Paris is basically four people. And they have very strong -- the folks in Paris have very strong notions about where they're taking their model in the future and which changes they should accept and which they shouldn't. And one of their overriding design principles is portability. They want their ocean model to work everywhere and everyone's different hardware with all sorts of different models. The Hadley folks want it to work very well in the Hadley model. So they're pushing changes on the group in Paris saying you've got to change this in your model to make it work in our architecture. And the folks in Paris are saying we can't make that change because that breaks portability with everybody else's models. So now they're getting into all these fist -- not fist fights, but they're getting into these fights over which changes need to be made to this ocean model and who's going to be responsible for those changes without forking. And that's a big problem for them. I think I might wrap up and -- let me say one more thing about future challenges. And this crops up in the literature in climate science quite often. There's this long-standing desire in the community to build plug-and-play models, to have each of the different physical modules be plug and play. You know, I take the ocean model from Paris and I just plug it in and have it work. And that depends upon, you know, having an appropriate shared architecture, having well-defined interfaces, having couplers that couple the different physical routines and do all the scaling across resolutions and all the boundaries, and all sorts of crazy things have to happen to make these modules work together. And for 20 years they've been talking about this in the literature: We're working towards plug and play, we're making progress on the whole plug-and-play thing. And I've sat down with several lead climate scientists over the last year and said, you know, this isn't working out. You're really not making any progress on this, are you. What's going on. And they've all said quite frankly it isn't going to happen and it isn't going to happen because of the core complexity of the physics. The domain is inherently tightly coupled. And it's tightly coupled in a way that it is impossible to separate out the different modules, have different people build them and have them work together without a hell of a lot of work. So integrating somebody else's ocean model into your atmospheric model, for example, requires lots of deep changes in both models to make them work. And they're now saying you know what, it's finally time that we accepted that and said this notion of a nice modular architecture just will not work in this domain. The physics of it just won't allow it. Now, I don't know if that's true or not, and not a way of validating that as an observation. Is it impossible for them to have the kind of modularity that we'd expect to see in good software. Do the physics prevent this. We don't know, and I think that's a great research project to undertake to find out if this is even possible. Let me talk about where next. So the couple of things that we're doing as immediate follow-ups to that study, one, as I said, one of my grad students is doing a much more detailed study of defect density. And I'll show you some of his preliminary results in a second. We want to replicate this study at other modeling centers to see are these guys unique or do other modeling centers do it differently. There's one CCSM, which is built at NCAR in Colorado, which is actually a community effort. It's the only one in the world that isn't an in-house, one-site development team. It's a community model. And so I want to go there and find out, first of all, is it really a community model. And what we suspect from our initial conversations with them is that it isn't at all. It's an in-house model in which there's a core team spending a lot of time taking code contributions from this open source community around North America and spending ages reimplementing them to get them into this model. We think that's what's happening. But I need to go and validate that. And I want to compare the V and V that goes on with these models with other kinds of simulation models. For example, economics models used in climate policy or other environmental science policies. Because we also think that the core approach to V and V that goes on here is kind of unique amongst scientific code. And one of the reasons I think it's unique is they're leveraging off a huge effort by tens of thousands of people around the world collecting meteorological data and validating that data. And it's hard to think of any other scientific discipline that has the level of activity going on in collecting and validating observational data against which to test the models. It just doesn't happen in many scientific disciplines. So that matters a lot to them. It matters that they've got huge volumes of observational data and it matters that a very large number of people are using that data for all sorts of things to validate it. And it also matters that there's lots of teams around the world simultaneously trying to solve the same problem that they can all compare against one another. And that, again, I think is unusual in scientific disciplines. So my grad student, John Pipitone, you can actually see his work. He runs a lovely blog in which he describes his progress on this. He's taken three different models. One of these is the Hadley model. I didn't check which one it is. I think it's the bottom one. But, anyway, it's either B or C. Anyway, he's measured the defect density of these models by trolling through their bug databases. And the scale along the bottom here of defect density, two defects per thousand lines of code, this is from Norman Fenton's book about code defects where he classifies anything less than two is good, from two to six is average, and then above that is poor. So by those standard code metrics in the literature, all three of these models are good quality code from a software defect point of view. But he showed this to several software engineers to say, okay, is the code good quality, and they'll say, well, you know what, these numbers don't actually tell me anything because it still boils down to what you call a bug. And because of this attitude these scientists have, which is appropriate for their domain, that many things that might in other domains be thought of as defects in the code are just acceptable imperfections, maybe we're just not counting defects enough. He's doing another study with static analyzers to troll through to find static analysis problems in their Fortran and see how that compares. And he's also gone around and interviewed a whole bunch of climate scientists to talk about this issue of bug versus feature, when does a bug become a problem or not. And he's teased out lots of factors, factors to do with momentum. Stopping to fix this bug will slow down the science, it will break repeatability, it will break a whole bunch of things. We're going to have to go back and do a whole bunch of experiments again, so we just accept the imperfection and call it a feature. The design of the model and funding issues matter. And then operation. How the bug affects various operational factors matters. And these different factors come into play at different points in the process. Whether we're in development stage or scientific experimentation stage matters. It matters when the bug was found. What time should I wrap up? >> Rob DeLine: Anytime. >> Steve Easterbrook: Let me say one more thing, and that is just to put this study into a broader context. So where I started on this was this question of can we help climate scientists. And the reason I was asking that question was because I wanted to ask a broader question of what should computer scientists in general be doing in response to this whole issue of climate change. What do we bring to the table that might help given that this is a societal grand challenge. And having studied this group with the follow-up studies that we're planning to do, we've started to come to the conclusion that that's not where the computer science as a discipline can have the biggest impact. And let me, then, just in two or three slides try to explain that, and then I'll stop and we'll do questions. So here's the system. Okay. We have emissions of greenhouse gases leading to changes in concentration in the atmosphere which affect the climatology of the planet which have downstream effects on a whole bunch of other things -- marine biology, agronomy, ecology and so on -- which cause a number of impacts from climate change. There's all sorts of change in the geochemistry which lead to other changes, and there's feedback effects, and there's all sorts of physical things going on in a very, very complex coupled system. And all the scientists that I've talked to are -- you know, they're working in some of these squares, they're working in climatology or one of the closely related disciplines. This is very, very complex science. What they have to understand are very, very complex physical processes. But nearly all of them say you know what, this science is done. We understand the climate system well enough that this isn't the important question that society has to face anymore. The important question is the other part of the system, the part of the system where impacts lead to changes in public opinion, to discussions in the media which feed into policy, which is affected by industrial lobbyists, where policy affects what goes on in the economy and what's happening in the economy drives the emissions and so on. And all of these purple blobs are very poorly understood. It's like this part of the system we know really well, this part of the system we really haven't a clue about, and this part of the system down here is the stuff that matters. This is what we have to fix. Because the argument is that humanity has stepped in and taken control of the planet where for millions of years the planet had a whole bunch of natural control systems on the climate and natural feedbacks that just kept the climate stable. Every so often it'd flip, like from a glacial state to an unglacial state. But pretty much in any of those periods it was stable because of a number of feedbacks that kept this a stable system. We've now perturbed this system so much that whether we like it or not humans are now managing the planet and we don't know how to do it. And the reason we don't know how to do it is because we're getting all this stuff down here wrong. The discussions in the media are just ill informed. There are too many people injecting misinformation in the media. There are too few people that just don't understand the physics of what's going on, so they don't understand the urgency of the problem, the time scales in which we have to do things and so on. So our observation is actually computer science has a huge role to play down here in just understanding these systems and building tools and visualizations to make people just aware of what this system is and how it operates and what we know and how we know it from the various different physical sciences. So our observation is, you know, computer scientists have to step up to the plate, and we as a discipline ought to be able to respond in a systematic way in saying here's what computer sciences should do. So there's these beautiful reports. I mean, this is an IPCC report on the physical science. It's a huge, long report. And it summarizes everything we know about the physics of climate change. There's the Stern report which is a beautiful analysis from an economic point of view. There is an APA report on the psychology of climate change, how is this affecting people's ability to understand the future, to understand how they're going to adapt their lives to behavioral changes that have to happen if we're going to change the way we live our lives. So there's a huge amount of psychology research has to go on. Sociologists have put together a report saying this is a sociological problem of how people come to understand the social epistemology of climate change, how do people come to understand what they understand about climate. And how can we fix that. And so I said, well, where's the computer science version of this, where's computing as a discipline stepping up to the plate and saying what should we be doing. And the closest that I could find was this, which I thought was a very disappointing response. So we're starting to map out what we think is an appropriate disciplinary response by computer science as a whole to the challenge of climate change. And we've held a series of workshops. There was an initial workshop at ICSE last year. We did a workshop at OOPSLA in Orlando last month just trying to map out the agenda. And this is my latest map of where computer science can make a difference. So the green IT, green software, energy-saving devices, making everything that's controlled by software as energy efficient as possible and build that into the software. The study I described today fits into here. I'm kind of calling it computer-supported collaborative science. How can we help the scientists do what they do better using good e-science tools. So e-science is the other label for that box. And then this is the box that interests me the most. Maybe I should put some detail in here to show you what I have in mind. Software for global collective decision-making. This is the thing that humanity currently does disastrously badly right now. Global collective decision-making. And that involves getting the information to the decision-makers at the point that they need it in a form that's useful. And maybe I should just leave that as that's it. That's the problem. And that involves lots of interesting tools, visualizations, information systems, access to datasets, open collaborative approaches to building decision support tools. Anyway, you've got the idea. So I should stop there and take questions. >>: How did [inaudible] Fortran [inaudible] 77 [inaudible] same level [inaudible]. >> Steve Easterbrook: Yep, yep. And there's still a lot of Fortran 77 code in their code base. There's been a systematic attempt to bring it up to Fortran 95 which had modules. And, yeah, this is a problem. They solved this the same way they solved all the other problems, lots and lots of informal communication across the lab. Yeah. I probably should have a more detailed answer for you. There's been a systematic attempt to use code modules when they became available in -- I think it was -- what was it? Fortran 89? Or was it 90? I don't remember which one it was when they brought in code modules. So they're doing some of that now. And they want to be doing more. >> Rob DeLine: Any more questions? All right. Let's thank Steve. [applause]

Rob DeLine: It`s my great pleasure today to introduce

Related documents

Products

Support

Rob DeLine: It`s my great pleasure today to introduce

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib