Seminar 3: Creative use of archive ‘Digital Public Space prototype demonstration’ –Jake Berger, Programme Manager, Digital Public Space, BBC Bill Thompson, Chair and Head of Partnership Development, BBC Archives: Now I’d like to ask Jake Berger from the BBC to come and talk to us about the digital public space data model and what’s we’ve been working on. Jake Berger, Programme Manager, Digital Public Space, BBC: It’s, er... hopefully it’ll be a little bit more exciting than it sounds! Bill Thompson: The digital public space date model is so exciting. You and I love meta-data. Jake Berger: We do, yeah. Bill Thompson: These people just don’t understand the sheer glory of triples... Jake Berger: Yes. Bill Thompson: And linked open data. Jake Berger: And I promise I won’t mention the word meta-data, or triples, in this entire talk. But erm, it’s quite interesting. I’m the fourth man in a dark suit to stand up in front of you today, but you can tell we’re from the creative sector ‘cause none of us are actually wearing a tie. Right, that’s me, that’s what I do, that’s who I work for. I’d like you to imagine that every museum archive, gallery, library, theatre, and studio in the country could all be found next to each other. And that they each had every single item in their collections on display. And imagine if the smallest organisation’s archives and objects had exactly the same level of visibility and accessibility as the big nationals. And imagine that all of this material and information were linked together. Now, hold that thought for a moment. This is a picture of the internet. Now, the web finds us stuff, it shows us loads of stuff, and it links to loads of other stuff. So it’s great at linking things together, but it’s not yet great at making real meaningful connections between all of the things. That’s left up to the humans to do. It’s not very good at saying that this thing is like this other thing, or this thing is different to this other thing. These are not the same things. Paris Hilton is not the same as the Paris Hilton. But, if you try and find that picture by typing in ‘Paris Hilton’ or ‘Hilton in Paris’, you have to work your way, as I found a couple of days ago, through about a hundred thousand of these before you get that. Most people are probably looking for the one on the left, but I was looking for the one on the right. We need to do something about this. It shouldn’t just be what’s popular that is always first. But all of this is possible. This is about as technical as I will get today; this is the vision of linked data and the semantic web. It’s a bit hard to see here but that says ‘door’. Now if we can tell the web what each thing is and what each thing isn’t... rule number one: never let Rene Magritte do the tagging. [Audience laughter] And if we tell it what set, if you remember back to your Venn diagrams at school, or group of things it’s part of, and how all these sets relate to each other, then we can ask new kinds of questions and we can get new kinds of answers. Such as [shows audience picture] or [shows audience picture]. Probably get zero results for that, but it’d be worth a try. Or more simply [shows audience picture]. So you should get the idea. So, how can we tell the web what these things are? Well let’s call them entities. Now I think, and I’m very happy to be proved wrong, that we can understand every entity in the world as being either a person, a place, a collection, an event or a thing. Now, things are the kind of ‘get out of jail free’ card because it captures everything that doesn’t fit into all the previous ones. These entities are often associated with a time, a moment or period in the past, the present, or the future, and assertions that are made about these things by people and by machines. These assert us: we’re all assigned various levels of authority and perspective, so a curator, an expert, a witness, a creator. I’m sure some of you will be used to making assertions, and I’m sure some people will actually believe them. But all entities have some sort of physical representation in the real world, whether that’s a statue, a recording, a video, some ones and zeros on a memory stick, or a server, or a flash card. And some of these entities are going to have emotional states associated with them. These can be very different depending on which character you play in the story. Each physical representation is held somewhere – that’s the Amazon warehouse by the way, if you wonder where all your stuff comes from – or is displayed somewhere. And all of these entities will sit somewhere on a spectrum of availability and affordability, somewhere between free and open, or closed and expensive. So, how can we make this vision of connected availability happen? Starting with the material that we have in our own archives, in our collections, and the data, if we can classify or tag all of it, if we can digitise it, do this in a semantic web friendly manner, following some very basic, simple rules and approaches – there’s nothing more complex than the grammar you would learn in your first few years of secondary school – make them available and open, then people can find our stuff. They can make their own assertions about it; they can rate it to other things. They can tell us things that we don’t know about our own material, which then adds to the find-ability, the interestingness and the usefulness of it. It’s a positive feedback loop, it’s a positive cycle. You’re probably thinking this, and, quite rightly so. So what are we doing? Well we, the BBC in conjunction with partners, many of whom are represented in this audience, are trying to create a framework that makes all of this thing feasible for any organisation, small or large. We’ve drafted an overarching data model in conjunction with a number of organisations – this lot at the moment, but we have many more who are interested. The data model simply brings together a whole load of different catalogues, classifies and identifies them in a constant way, picks out themes within and types and sets in relationships, maps out those connections. Now this next slide. If you are of a nervous disposition I’d ask you to look away now, but I’ll only keep it up there for a couple of seconds. This is the data model which you can’t see there, thank you lights. [Audience laughter] Bill’s actually got it tattooed on his inner thigh if you’re interested, and I’m selling posters at the end at very reasonable prices, so come and see me. So, turning this vision into something that’s useable and interesting, well we’ve created a prototype system that aggregates for all of these data sets, and translates them into the categories of people, place, collections, events and things, and starts to make connections between them, and will eventually enable all of the other things that I’ve talked about. But at the moment it’s relatively basic. So I’d like to show you this system, but I’m afraid I can’t because my developer broke it last week while ingesting 10 million records of the national archives. I’ll have something that I can show you soon. I can show you a slightly shaking version of it in the break, or come and talk to us afterwards. But actually the visible bit isn’t the important bit of this; the important bit is bringing all of these data sets together, being able to translate them. The really clever bit of a few alga-rhythms that create and associate all of these different things in ways that a human being could do if they had, I don’t know, 10 million years at their disposal. What I was going to show you is a couple of example interfaces that we’ve built over the last few months that demonstrate the kind of thing you can do on this platform. They would have looked like this, so here’s the view of the Royal Opera House; it shows a few things you can explore. Don’t know why Southend Pier is up there, but there you go. This is a person page for Winston Churchill. You see it’s just pulling in information from other sources. A place. A thing. An event. And if you can see at the top, it’s beginning to group these things together so the event is part of tourism ceremonies, trade events, Royal Festival Hall. None of this has been hand created or curated, all of this is linked, structured data and algarhythms that are saying, ‘this thing here is probably like that thing, and if that thing’s related to those things then this thing’s probably related to those other ones’. [Laughter] I’ll draw you a diagram. We also wanted to have a bit more of a kind of video-friendly version of it, so here’s an interface which finds a whole load of videos related to Enid Blyton; when we’ve got this hooked up with the British Library’s collection it will show you books. This is a time view which lets you jump from millennium to century, to decade, to year, to month, to day, and pulling bits of information from everyone’s collections that relate to that particular moment in time. Here are the results from the first database for Swan Lake, breaking them down into things, events, collections, places. This has only got about probably half of 1% of the amount of material that it will eventually have. So if we can get some interest in connections across different people’s collections with the 1%, imagine what happens when we multiply that by a factor of one hundred. And then this just lets you kind of create your own view of it, or see what other people are interested in, in a kind of ‘my favourite things’ page. But this can only work if it’s much, much bigger and broader than the BBC. All we’re really trying to do is create standards, frameworks and tools for other people to use. We can do this because we are funded by you and, you know, 60 million other people. We should do this because we have engineers, we have archivists, we have producers, and they’re all generally pretty busy and there’ll be a few less of them today after today’s announcement, but we feel it’s a fundamental thing that the BBC should be doing, in the same way that it makes sure that your radio would work from the peak of the highest Scottish mountain to the lowest valley, maybe. It must work for everyone – for the smallest organisation or individual due, you know, down, or up, up to the biggest behemoth. So we want people to contribute data and media to make it available, we can help you understand easy ways to do that. We need people to play with what we’re creating, try and break it, tell us how to make it better. Tell us, ‘Ooh, if only it did this thing, suddenly that would fit my world’. And we want people to think about how they could use what we’re creating to supplement the stuff that you’re already creating. Everything that we would pull together here we would like to be usable by, you know, small websites, by small exhibitions, by school kids’ projects through to massive national projects. If you’re still interested, then come and talk to us. I didn’t realise Tony was gonna be here, otherwise I wouldn’t have used that picture, but sorry Tony, it was the second one that came up on Google. If you’re really important, then you can talk to Roly. Thank you for listening. [Applause]