>> Helen Wang: Good morning. It's my great... Crockford. Doug is from Yahoo, and had he is...

advertisement
>> Helen Wang: Good morning. It's my great pleasure to introduce Doug
Crockford. Doug is from Yahoo, and had he is the inventor of JSON, and he also
has discovered all the good parts in JavaScript. Can you believe that? Well, it's
great that Doug graciously offered to talk about two basic offering two talks in this
one-half and hour slot talking about both the JSON saga and the web of
confusion. Thank you, Doug.
>> Douglas Crockford: Thank you. Good morning, man. It's great to be here in
tropical Redmond. [laughter].
So as Helen said, I have two talks for you this morning, the JSON saga, which is
the history of the JSON data interchange format, and web of confusion which is
security talk about what it is that the web got right that everybody else continues
to get wrong.
I'm going to start with the JSON thing. There's a lot to talk about, so get started.
But one word of warning. I have a heretic. So if you're offended by heresy, you
should leave now.
So I discovered JSON. I do not claim to have invented JSON. I think JSON
already existed in nature, I just found it and gave it a name and showed how to
exploit it. I don't claim to be the first to have discovered it. My discovery was in
2001. I know that there are people in 2000 who were using something very
much like JSON. The earliest use I found of someone using JavaScript to deliver
data was at Netscape in 1996. So this is an idea that's been around for a while.
I don't claim it to be original to me.
What I did was I gave it a specification and a little website. And everything else
happened literally by itself. I can't claim credit for very much in this saga.
The story for me starts in 2001. I formed a company with Chip Morningstar to
develop what today would be called Ajax and Comet application frameworks.
We weren't able to decide what the name of the company was, so we thought
temporarily we would call it Veil, and then later when we were ready to go public,
we would unveil the company and show people the true name.
So I went ahead and created this logo for Veil, knowing that it was going to be a
throw away thing. And I really liked it, and I was kind of sad that we had to throw
it away, because I think it's a really nice low go design.
What we ended up being called was State Software. We hired an advertising
agency which gave us this pair of frisky parameciums [laughter] and the negative
space in between them is sort of letter S. So that was supposed to be iconic of
the company. We did that in 2002.
But the very first JSON message was transmitted in April of 2001 in Chip's
garage from his server, which was I think an old sun box or something, and my
laptop. And you can see the JSON message there in green. Our application
platform was based on distributed objects, and so we would send messages to
objects across the network. In this case, we were sending the message to the
session object, and we were sending a test message. We weren't aware at the
time that we were making history, so we didn't do anything as momentous as
what have God wrought or Mr. Watson, come here, I need you, it's just hello,
world.
And in this particular -- the envelope that we used for this first message was an
HTML document which was sent in response to a form-submit post to a frame, a
subframe within the document. We did it that way because it worked on
Netscape 4, and there are a lot of web developers in the world today who hate
IE6, but at that time in history IE6 was far and away the best web browser the
world had ever seen. And Netscape 4 was not. It was a crime against humanity,
a really bad, bad we be browser.
But there were a number of technologically backward companies that insisted all
of their employees use it, and some of those companies were companies we
were hoping to do business with, including Sun Microsystems and IBM. So we
felt it was important to support Netscape, and this is the way that we did it. So
that HTML document delivered a script.
First thing the script did was get around the same origin policy. And the second
thing it did was call the receive method on the session object in the parent frame
passing the JSON object as the data. And that worked really world. Or at least it
was supposed to have worked well. But that first message failed. And it took us
a while to figure out.
The reason it failed was do is reserved word in JavaScript. And so this
generated a syntax error. It took some reading of the specification to figure out
why. It turns out that ECMA Script 3 which is the standard for JavaScript has a
wack reserved word policy. Reserved words in the key position of an object
literal have to be quoted, even though there's no good reason for that to occur.
Later when I got around to trying to specify JSON as a standard, I was trying to
do two things. One, I was trying to convince people that JavaScript was a
language you could use for application development so I did not want to say and
by the way here's something really stupid about JavaScript which should cause
you to be really suspicious of it. So I didn't want to put the reserved word list into
the JSON spec.
So I decided instead, let's just quote the words, that way no one will ever have to
know.
It had a side effect of significantly simplifying the JSON spec. It turns out if you
have names, you have to declare what a letter is. And in the inner code world, a
letter is a really complicated idea. And by saying we don't have letters, we just
have strings, we avoided all of that. And that was a great simplification.
Another benefit was it conformed more closely to a syntax that was already built
into Python. Python has things that look very much like object literals, but the
keys have to be quoted. And we were hoping to make friends in the Python
world. So that seemed a good justification, too.
The next problem we found was that if we had a string which contained
something which looked like HTML in this envelope, it wouldn't get delivered
properly. In this case, we've got something that looks like a closed script tag. It
isn't, it's a string literal within a JSON message within a piece of JavaScript, but
the browser thinks that ends the script block, and so that causes another syntax
error.
So then we amended the JSON grammar to tolerate a back slash in front of a
slash within a string so that we could get things through this HTML deal. We
decided to give this language a name, and the first name we gave it was JSML.
It rhymes with dismal. Which stood for JavaScript message language.
But it turned out there was already in the JavaWorld a JSML standard, the Java
speech markup language, something no one's ever heard of. But there was
already this thing, and we didn't want to have confusion around it, so we thought
about another name. And we finally came up with JSON the JavaScript object
notation.
We found that JSON was really useful. I mean, it was obviously useful for doing
the browser-server communication. And as you just saw, there's virtually no
work required on the client in order to use this language or that's an extremely
desirable feature. We also found it was really effective in interserver
communication, applications where JavaScript was not involved at all.
We had a highly scalable session architecture. We had lots of machines that
could talk to each other. And we kept them all in sync by sending JSON
messages. And it was really effective for that.
We also used JSON for persistence. We had a JSON database which was really
easy to implement, really easy to use. We were really happy about JSON, and
we wanted to tell all of our customers about it and recommend they get on it, too.
And they said things like, well, we hate it because we've never heard of it. And
some of them said I wish you talked to us six months ago because we just
committed to XML, and we just can't consider anything else right now. And there
were people we talked to who said it's not a standard. We said yes, it is, it's a
proper subset of ECMA 262. They said that's not a standard.
So I decided, okay, I'm going to have to declare it a standard. So I went into the
standard's business. I bought JSON.org. I put up a one-page website that
described JSON, including the grammar specified three ways as simplified BNF
using a notation that McKennan of Dartmouth recommended to me; railroad
diagrams; and informal English.
I also included a Java reference implementation just so that people could see
how easy it was to write one of these things.
And then I retired. It turned out we ran out of money 2001 and 2002, right after
the Internet bubble popped and right after 9/11 was a brutally difficult time to be
trying to raise venture capital. So I decided to give up software for a while. I
went into -- I went back into consumer electronics. I was doing consulting on HD
TV and the digital TV transition until I waited for Silicon Valley to get its act back
together again.
That's all I did. So for the next couple of years, I did absolutely nothing to try to
promote JSON. I wasn't going to conferences, I wasn't blogging, I wasn't
Twittering, I wasn't speaking about it, I wasn't publishing about it. I just had this
little web page out there. I put a message format in a bottle and set it adrift in the
Internet. And people found it. People started coming to the site and saying,
yeah, that's exactly what I need and started building stuff with it.
And after a while, they started sending stuff back. Contributors started sending
me translations of a thing to run on Perl or Ruby or Python, lots of languages
were suddenly getting JSON support.
One of the advantages of having a specification as simple as the JSON
specification was that it was not much work for anybody to adapt it for another
language. And so over time we got sport for all of these languages. This is a
pretty impressive list. I think virtually anything that anybody is writing today is on
this list.
So you can communicate data between any pair of programs written in any of
these languages and it will work. That's an amazing thing. And the reason it
works is that JSON is at the intersection of all programming languages.
Something all languages have in common is and understanding of a set of simple
values, generally numbers, strings, and booleans, some kind of sequence of
values which is an array or a vector or a list. It's different in different languages,
but every language has some notion of one of those things. And some collection
of named values. And that might be called an object or a record or a struct or a
hash or a property list. Again, every language does this differently. But they all
have something like this. And JSON talks to that common bit of all of the
languages.
There have been other attempts at data interchange which try to be the union of
all the languages, and that turns out to be extremely complicated. But by going
for the intersection, JSON's actually really, really easy.
This is an example of or a snippet from and implementation of a JSON parser
using a recursive descent, a really easy technique to pars JSON, very effective.
This is an example using a state machine, push down finite automata. Most of
the work is done in the statement that's in green where we go to a table and look
up using the current state and the current token get a function, execute that
function that causes some action to occur and transitions to the next state.
JavaScript turns out to be a brilliant language for writing state machines because
you can put your functions right in the state transition tables so it's really, really
clean. The way most people use JSON in JavaScript today is through the JSON
2 library, which uses eval, which gives you access to JavaScript's own compiler
to pars the JSON text. The problem with that is that if the JSON text isn't actually
JSON section then you're exposing yourself to some security dilemma.
So JSON 2 first runs your text through four regular expressions in order to verify
that nothing bad's going to happen if it goes to eval. Originally it was one, and
then some stuff got through, and then it was two and stuff. Write you
expressions are a lousy medium to use for validating anything. But that works.
But fortunately good news is that the next edition of ECMA script, the fifth edition,
the first new edition in 10 years will have built in JSON support. So you'll be able
to pars natively, it will be really, really fast and much more reliable.
It's available today now everywhere in better browser. And soon it will be
available in all of them.
One of the benefits of having the description of the standard be so simple is it's
not a lot of work for anybody to translate it. And I was very, very happy to have
wonderful people from all over the world submit translations of the page so that
description is now available to people in all these languages, which is just
wonderful. I love this.
If it turns out that you're fluent this a language which subject on this list and you'd
like to help out, please, help out.
So the big thing that put JSON on the map was Ajax. Jesse James Garrett
discovered in 2005 what a lot of other people had discovered in 2000, which was
that you can use browsers for doing applications and not using page replacement
as the primitive by which people can interact with an application.
Nobody was interested in that in 2000, but in 2005 it was really, really hot. And a
lot of web developers discovered that it was a lot easier to do that using JSON
than using XML. Now, there were some cranks at the time who said you can't
use JSON because Jesse James Garrett said the X stands for Ajax -- or for XML,
and that's what you have to use. That didn't last very long. So the smarter web
developers got on to JSON pretty quickly.
When I saw what people were actually doing with it, I was a little bit alarmed. I
saw some things like using comments to send meta instructions to the parsers,
which meant that interoperability was not going to work because everybody
would be dependent on stuff which was not specified and uncontrollable. So I
changed the standard. I removed the comments to frustrate those dangerous
practices. It also turned out to produce a lot of unnecessary complexity. In
looking at some of the ports of the JSON parser in other languages about half of
their work was just getting the comments working. And that seemed completely
unnecessary. So by removing the comments that made it even easier to port to
other languages, it also provided alignment to YAML, which is another data
interchange format which stands for something funny. And the YAML community
coincidentally created a language which was a super set of JSON. It had JSON
as a subset. Except that they didn't have C style comments as JSON did. So by
removing the comments, it aligned JSON closer to YAML so that it could be a
proper subset.
>>: What is YAML?
>> Douglas Crockford: I don't know. Yet another markup language, something
like that.
>>: [inaudible].
>> Douglas Crockford: Then the last thing I changed was adding scientific
notation to numbers. At State Software, we were doing business applications
and scientific just didn't occur to us that there was a need for it. But as Ajax
became more popular, the need appeared, and so that was the last change I
made. And sort of sealed the door to any further changes at that point because I
didn't put any kind of version number on the thing. So now that it's at large scale,
there's no way to say and here is a syntactic variant and have people adapt to it.
There's no way to do that. And that was intentional.
So as a consequence of that, JSON will not be changed. Because if there was a
version number on it, then, you know, if it was JSON 1.0, then you know there's
going to be a 1.1, and there's going to be a 2.0, and everything is crap until it's
3.0. [laughter]. I didn't want to go through that. I just say it's done. And maybe
some day it will be replaced by something better or bigger or more complete. But
it will never change. So you can -- if you're using JSON you can be confident
that there's at least one layer of the stack that's not going to shift out from under
you. And I think that offer is a much greater advantage than any change we
could put into it.
One of the principle goals of the design of JSON was minimalism. Minimalism is
way undervalued in standard setting. And it turned out that the design of JSON
is so simple that it can fit on the back of a business card. That's literally true.
This is the card. This is the back. It's there. If you want a copy of the card,
come see me after. I'll be happy to give you one.
Now, I'm not saying that every standard should fit on the back of a business card.
But I think it's a nice thing for standard's body to at least contemplate or may be
slightly desirous of. Because it's really easy to make a standard bigger. It's
really hard to make a standard better. And minimalism I think is one of the paths
to doing that.
Also, the less we have to agree on in order to interoperate, the more likely we're
going to be able to interoperate well. So minimalism is the way to get to the
simplicity that we need.
There are a lot of influences on the design of JSON. I'd like to go through some
of those. The most important, the grand daddy of them all was Lisp. Sean
McCarthy's AI language from 1958 at MIT. Lisp was based on a graphic -- on a
textual notation for simple binary trees. And they used those binary trees to
represent data, and they also used those expressions to represent programs. It
was very clever the way that they did the two at the same time. Very, very
powerful notation.
There are many people who said we should be using S-expressions for data
interchange, which would have been a good idea. Generally the marketplace
rejected S-expressions for the same reason it rejected Lisp which is it's just too
whacky looking. There's virtually no syntax there at all, it's just deeply nested
sets of parentheses. And the Lisp community says that's absolutely the right way
to do it, and everybody else says get away from me with all those parentheses.
The world likes syntax. And so that didn't happen. So it's unfortunate because
the world should be writing in Lisp or in Scheme and those languages. There's
really good languages because they're really important ideas there.
And it turns out the most important of those ideas is lambda, first functions as
first-class values. And the first language to go mainstream with that was
JavaScript. Just and amazing thing.
Another language that influenced JSON was Rebol, Carl Sassenrath's little
scripting language, which is also based on an idea of a data representation
language which was also executable. Rebol is brilliant. It has much richer
syntax and an extremely set of types. Really nice little large. It deserves a lot
more attention than it's currently getting.
Obviously JSON was influenced by JavaScript because JSON is JavaScript. I
seem to have made a career out of mining good stuff out of JavaScript. JSON is
one of those things. I wrote this pamphlet, JavaScript the good parts which is
more of that. It's not accidental that there's all this good stuff in JavaScript. The
guy who designed the language, Brendon Ike, brilliant guy, and this stuff was put
in there intentionally. It's just a lot of bad stuff got in there too.
Coincidentally the JSON notation, the idea of nested structures made up of curly
braces and colons and brackets occurred simultaneously in JavaScript Python
and NewtonScript. All these languages were designed around the same time.
None of these designers were in communication with each other. This was just a
spontaneous invention, the same notation showing up three places.
And I think that indicates that there was something natural, maybe inevitable
about this notation. Looking at sort of the C style of how you do syntactic forms
and then applying that to data, this is the natural way to represent data in
programming languages.
And there were even earlier instances of this stuff. For example, at NeXT, the
OpenStep system had property lists which were basically JSON structures. The
syntax was slightly different. They were using colon -- or semicolons instead of
commas, and they were using colons instead of equal signs. But they had this
idea. They had it and they had it right. And that was in '93. So again, I present
this as evidence that this is a natural sort of thing that's been bouncing around in
the cosmos for quite a while now.
Then XML was a consideration. Not in terms of what it did, but about how it got
standardized. XML is a lousy document format. And the world rejected it back
when it was called SGML. When it was transformed into XML, they didn't repair
any of the things that were really crappy about SGML, they just changed some
other things and gave it a new name. And I'll offer as evidence that it's a crappy
document format that XHTML is a total failure. If XML were the right way to mark
up documents, XHTML should have beat HTML into the ground, and it didn't. It
just died.
So XML is not a good document format. Later I'll show you a better format. And
even worse is a data format. So what was interesting about XML is how did this
turkey become so popular? What was the mechanism by which that happened?
And I think the answer for the answer to that you have to look at what happened
with HTML. When HTML first emerged, there were a lot of A list CTOs and
technologists who looked at it and said, well, this is so obviously deficient. This
subject going to go mainstream because it's got all of these problems. But there
were enough B level and C level guys who got real excited about it to get the
avalanche going that it went mainstream anyway.
So a lot of those A list guys thought they were wrong. But they weren't wrong.
Everything that they identified is deficient in HTML is actually deficient and we're
all struggling with that today. We've been struggling with it from the very
beginning, and it has not gotten any better since day one. So but they thought
they were long because they asked the wrong question. So they should ask not
if it's good enough, they should ask is it going to be popular enough.
And so what a new system came out from the same people who brought you
HTML, which also had angle brackets, nobody was going to oppose it. And so it
went mainstream.
John Seely Brown was running Xerox Park, one of the most brilliant research
operations in the history of the universe. They developed graphical user
interfaces, object oriented programming and local area networking and laser
printing, just all the stuff that we do today came out of there. Seely Brown, John
Seely Brown oversaw all that stuff. Brilliant guy.
I saw him talk at the CTO Forum in San Francisco in April of 2002. And he was
talking about the new world in which you have loosely coupled systems and XML
is the thing that would bind them all together. And he said of XML, maybe only
something this simple could work.
Couple months later I was attending another conference listening to a speaker
who was a little closer to the ground talking about XML, and he said maybe only
something this complicated could work. And it really impressed me how did this
go from something so simple to so complicated? How did that happen so fast?
And I think the reason was that it just doesn't fit. Its model for how you do data is
just the wrong model, and so it introduces so much noise and so much difficulty
and complexity, that it took something that should have been simple and turned it
into something that was really hard.
There were a number of people at the time who recognized this. For example,
there are websites out there such as XMLsucks.org. This particular site the
premise was why is XML is technologically terrible, but you have to use it
anyway. So basically there were two schools of thought on XML. One was that
it was perfection. This is the thing we've always been waiting for, the inevitable
result of evolution. And the other school which said that it wasn't. One thing that
they could both agree on was that it was the standard, so shut up. Shut up. That
was shut up.
Not everybody shut up. There are a bunch of crackpots out there who
recognized that there were obviously things wrong with XML and proposed to fix
it. And so you got each of these crack pots came up with their own idea. And a
guy name Paul Tee collected them all on a page. And I'm sure there were lots of
others there that he didn't collect. Paul was doing this because his was one of
them, and he was hoping that by calling attention to everybody's that he would
get more attention for himself.
So every one of these designers probably had a good idea. They all shared an
idea of what was wrong with the thing, and they all had their own idea about how
to fix it. It was unlikely that any one of them would have said oh, that other guy,
he did something as good as or better than mine, so I'll drop mine and embrace
his. None of them were going to do that. So it was just a lot of noise. And there
was no way for any one of them to rise to the top.
Except that one of them worked real well with Ajax. And that wasn't a
coincidence, or it wasn't accidental, it was intentional. It was designed
specifically to work well with Ajax. So that caused JSON to float to the top. And
JSON won out of this set and eventually went on and displayed XML in a very
large, every gross set of applications.
So the XML community was not real happy about that. They were -- had entered
as a disruptive technology and now they were being disrupted themselves, and
they didn't like it. And at first the response was disdain, you know, you can't do
that, it's the standard shut up. Why aren't you shutting up? After that they
started making threats.
And at first it was more like sputtering like you'll rue the day you ever questioned
the technical superiority of XML, that kind of thing. It was kind of vague. You
know, why you would be doing that ruing some time in the future.
Then the threats started going more specific. As Ajax started growing they were
forced to admit that, okay, it works all right with your little web apps but if you're
doing big manly applications you need the complexity in XML. If you don't have
that complexity is there for a reason. It's there to help you and if you don't have
it, you will fail.
Again, they couldn't articulate why it was going to fail, but they were pretty
emphatic that you had to watch out for this.
Well, since then manly applications have been developed in JSON and they don't
fail, they just get faster. So ultimately they were reduced to death threats.
Literally death threats. For example Dave Winer, a couple days before
Christmas in 2006 said it's not XML. Who did this travesty? Let's find a tree and
string them up. Now. What an ugly thing to say. Fortunately nobody listens to
Dave Winer.
James Clark who was one of the principal architects of XML a couple months
later said any damn fool could produce a better data format than XML, which it
turns out is true.
So somehow in the whole XML insanity we forgot the first rule of good
workmanship, which is use the right tool for the right job. Instead we got
obsessed with this one tool to rule them all. And any sense of good engineering
where you're trying to pick the best tools, the best methods, the best materials
went out the window.
So if the only benefit of JSON was that this idea became popular again, then
that's a really good thing. So I'm not claiming that JSON is the only tool that you
should use for everything, what I am suggesting is that JSON is really good at a
bunch of things. And if you're doing one of those things, you'd do well to get on
to JSON.
So that caused me to wonder, where did the idea come from that data should be
represented by a document format? I mean, in retrospect, that seems like
completely nutty idea. So where did it come from? So going back in time -yeah?
>>: [inaudible] description of the difference between data and the document?
>> Douglas Crockford: Yeah, data is stuff that programming languages like.
>>: So do you think that XML is good for.
>> Douglas Crockford: Probably. [laughter].
So let me run through this. Okay. So going back in history, one of the very first
text processing programs was something called runoff. It was ran back on the
old mainframes. In some cases each line of text would be on a punch card. In
some of the older versions, the text would have all been in upper case with
special markup to indicate which character should be set in lower case. And so
you can see a line that starts with a letter is text and will get filled into a
paragraph. And the lines that start with a period are some kind of control.
In this case SK means skip a line and TB 4 means tab over four spaces. So it's
kind of brutal machine oriented markup. Charles Goldfarb of IBM thought he
could do better, and so he came up with his generalized ML. And this shows one
stage of the evolution of that language. He now has a colon in column 1 and by
having -- by ending the tag or the command with a period, he's able now to put
text on the same line as the commands.
Any of you who are familiar with HTML will recognize those -- a lot of those tags.
They're eerily familiar. Except for the EOL thing. You can probably guess what
that represents. So as he went through you his evolution from GML to SGML,
you can see we have this progression. And then finally ended up with angle
brackets. And in a moment I'll show you where the angle brackets came from.
In 1980, Brian Reid, I think he was at Carnegie Mellon, published Scribe, which
was what he called a document compiler. And had it -- this was the first instance
of getting the separation between document structure and presentation right.
Reid got this right in 1980. And he also had a really, really nice notation for how
you describe documents. He only had one reserved character in the language
and that was the at sign. If you wanted a literal at sign, you just had two of them.
If the at sign was followed by a word, then that would indicate a tag of some sort,
which could have a beginning quote and an ending quote.
And then you have stuff inside of it, which is affected by that environment. And
he had six sets of characters that you could use for those quotings so that if
you're doing stuff which is deeply nested you could avoid the Lisp problem of
having too many parentheses and that become difficult to match.
This notation was extremely easy to match and to avoid the use of special
characters in cases where you want literal characters. You also had a long form
where you had the word begin and then the name of a tag and then everything
up until the end matching tag was included in that. Which was really nice for
doing very long things like chapters or tables, you know, very complex structures.
Really, really nice.
Goldfarb saw the angle brackets. Oh, yeah. Never thought to use those. But
clearly they had a very big impression on him. But I can tell you from experience,
this is much easier to write correctly than SGML or XML or HTML. The reason
why XHTML failed was web developers just can't write HTML and get it right. So
you need some resiliency in it. Having a system which totally fails if you don't get
the markup right just the market has no tolerance for that.
But getting back to where'd the data come from? One of the things that Reid had
inscribed was a way to describe entries for the bibliography. Here is an example
of a tech report in a book. And it looks like JSON, doesn't it? I mean, you've got
name value pair separated by comma. This is in a document, and it's describing
a document, but it is data. Goldfarb picked up on some of this. And this is where
attributes came from that we got into HTML. Unfortunately he didn't pull in the
whole Scribe thing. And it's real unfortunate that Tim Berners-Lee wasn't more
aware of document formats. If he had picked Scribe instead of SGML as the
basis for his worldwide web, the worldwide web would be a much better place
today.
But this is where the idea first occurred of putting data in a document. And this
idea moved into the SGML community and eventually into the XML community.
When I published the reference implementation of the JSON library, I needed a
license to publish it under. And I looked at a lot of licenses. And I decided I liked
the MIT license because it was not restrictive. It just says leave this notice in the
source code and don't sue me. Otherwise you can do pretty much anything you
want. Which is really nice.
But this was just after 9/11, okay, when we were starting on the war of terror and
I was worried well if I open this stuff up, what if Osama bin Laden were to use my
software [laughter] I'd feel really bad about that. Because the president said
we're going after the evil doers, and I thought that was a good thing. So I added
this line to the MIT license. The software shall be used for good, not evil.
And it's worked really well. So I now put this on everything I write. Some of the
->>: [inaudible].
>> Douglas Crockford: Well, yeah. So about once a year I get a letter from a
crank who says how can I know if what I'm doing is good or evil? [laughter]. I
said well, you shouldn't be using my software. [laughter].
And it works. You know, I know that evil-doers say I will not use your software
until you fix the license. Great. It's working. About once a year I'll get a letter
from an attorney at a famous company. I don't want to embarrass them so I'll
just say their initials, IBM. [laughter]. Where they've got some project that wants
to use something that I wrote. And they need special permission, you know, that
we don't intend to do evil, but you know, we can't speak about our customers.
In fact, this literally happened two weeks ago. So I send them back this
message, I said I give permission to IBM, its customers, partners, and minions to
use JSLint for evil. [laughter].
And he wrote back thanks very much, Douglas.
So finally, let me include with the JSON logo. When I put up that web page, I
decided I need a logo on this page in order to make it look, you know, more
credible. And so I came up with this thing. And what is that? It's based on a
famous optical illusion called the Impossible Torus, which is closely related to
another famous illusion called the Ambihelical Hexnut. So what I did was I took
the Impossible Torus, and I rounded it out and realigned it and gave it some
interesting shading. But it's still topologically the same figure.
I like a lot of things about it. One was if you look at it two dimensionally, it is two
symmetrical pieces which just clip together to form this O-shaped thing. And so
that kind of suggested the two sides of the conversation going around and
around, you know, which is what JSON was invented for. Designed for.
Discovered for. It also seemed to have some letter forms in it. Like I could see a
letter J kind of in there. And the O obviously and the N. So it almost spelled out
the name of the thing that it was describing.
Any time you have two curves you can kind of pretend there's an S in it. But after
looking at this for a couple of years, I made and amazing discovery. It's not
impossible. You take a square and you extrude it in a circle, okay. And then as
it orbits have it do one revolution, and it makes this. So it's a square and a circle
with a twist. It's not impossible. It's actually a simple figure. I think this is a
wonderful metaphor, a wonderful symbol for JSON. I'm really happy about that.
So once I figured out what it was, it was easy to write a mathematical model of it
and do it in software. So this is a version written in JavaScript using a canvas
and some really extreme lighting that I did for a T-shirt design. So it's nice now
that web have browsers have gotten good enough to generate their own logos.
This is one that I did for this business card trying to do something that looked like
it was around for a hundred years. You know, the data interchange format
mothers have learned to love over many generations. Then finally, this is the last
thing. [laughter]. This one was inspired by Shepherd Fairey Obama poster. I
called it data interchange we can believe in. [laughter].
All right. So should we do questions or move on to the next one?
>> Helen Wang: We can do some questions.
>> Douglas Crockford: Okay. Yes?
>>: So [inaudible] for another language, is there some kind of standard set of
regression test that you could run to show that [inaudible] actually works?
>> Douglas Crockford: No. There is a set of tests available on the JSON site
that you can do your thing on, but nothing really formal. Yes?
>>: So it turns out YAML is YAML ain't markup language.
>> Douglas Crockford: There you go. I knew it was funnier than yet another.
YAML ain't markup language.
>>: So given that XML is not a good document format, is not a good data format,
and yet [inaudible] do you think many people are still using it?
>> Douglas Crockford: So given that XML is not a good data format and not a
good document format, are there people still using it? Yes. There are a lot of
people still using it.
>>: So where do we go from here? So JSON is a good data format. So what's
your opinion about a good document format? Are we to wait? Do you think we
may be able to [inaudible] into words a better document format?
>> Douglas Crockford: How could we get a better document format? I think we
could do well. I would recommend anybody who might, I don't know, be doing
research here or something like that, to consider looking at Reid's work and
seeing a 21st Century version of Scribe, what would that be like? Could we
generalize those ideas and do something really powerful? Keeping the idea of
an extremely simple syntax, something that will not confuse people but which still
gives them enough expressive power to do the kind of things we want. SGML
turned out not to be that language. I think stepping back 30 years, I think we
have a better chance of getting it right.
>>: Do you mean that we cannot get it right anymore because we already have
this huge amount of ->> Douglas Crockford: We'll never get SGML ->>: HTML ->> Douglas Crockford: We'll never get XML right. It is what it is. I can grab the
version numbers, but it's always going to be what it is. Yeah?
>>: So I've used XML to do parsing, looking forward and just expecting this big
piece of information. It's not necessarily the whole object, it's just enable and
somewhere in the text that I don't care where it is [inaudible] looking for errors.
How would JSON use otherwise? Do you have to parse the data all the same
day or what's the advantage of those kind of situations?
>> Douglas Crockford: JSON's fast enough you just go and parse the whole
thing and get what you want out of it. I don't want to tell you how to do big things.
If it's a crazy thing that doesn't work in JSON, then, you know, maybe crazy
applications are what XML is for.
>>: That's what I'm trying to get to is with large documents and this sort of
operation that I described you can still parsing the whole document into an object
and then trying and ->> Douglas Crockford: I would go back to assumptions, why do you have large
documents that you have to do this needle in the haystack stuff on it. Maybe
there's a smarter representation you should be doing. Maybe you're approaching
the problem incorrectly. I don't know. I don't understand your problem, but it
doesn't make sense to me.
Okay. So let's move on to the web of confusion. So the cross site scripting
attack was invented in 1995. And we have made no progress on the
fundamentally problems that enabled that attack since then.
So that causes me to ask a question, will the web every reach the threshold of
goodenoughness when we're not constantly under subject to these attacks. A
positive answer would be that as we discover vulnerabilities, that will lead to
corrections. And we have been discovering vulnerabilities for the last decade or
two. But that doesn't seem to be converging on anything. You would think that
the rate as we correct vulnerabilities that will introduce new vulnerabilities
because hopefully at a slower rate. But you would still hope eventually to
converge on to something that's going to be good enough.
On the other hand, we're adding lots of new features to the browsers and adding
new features tends to introduce new vulnerabilities and unintended
consequences. And right now the HTML 5 thing is out of control, and we're
getting a lot of new stuff coming into the browsers, most of which has not been
thought through very well, nothing of which has been formally vetted. So that
might make people anxious that maybe we're making things even worse.
But I think the key answer is that if the fundamental assumptions are faulty,
incremental correction never converges on goodenoughness. And I think that's
where we are. That's why we've made no progress on fixing the browser since
its introduction, even though we've been patching it pretty aggressively ever
since.
What we are doing is creating an evergrowing corpus of hazards. So we now
have a very detailed list of things that need to avoid doing in order to avoid
compromising your applications. But it's unreasonable to expect web developers
to understand all of this stuff. There's just way too much. Perfect knowledge is
not an option here. It's just unreasonable to expect developers to understand
enough of that corpus in order to be effectively at protecting their applications.
So that leads to another question. Is the web too big to fail? That's a popular
mean these days. And there are some technologies that are hoping that it's not.
So we see Flash and Sliverlight and Java FX and other things that are hoping to
displace the web. You know, can they do that? Can they displace the web?
And the web is obviously deficient on many dimensions. It would appear to be
vulnerable.
But I can tell you that the web got closer, despite its obviously failings to getting it
right than everything else. It got it better right than everything else. And I hope
to demonstrate what that was.
But first, let's review what went wrong. So there are a lot of ways things goes
wrong. The standard one is we'll add security in 2.0. That doesn't work. And
you think people would know that by now, but they keep doing it over and over
again. You get architects and developers who think that the hard part is getting
the system cycling or getting the pixels on the screen or getting the bits across
the wire. And once we get that hard part done then we'll do the easy thing of
securing it. And that turns out not to be the hard part.
Security is part of the itty-bitty-ity committee, including quality, modularity,
reliability, maintainability. These are things that you can't add. You can't add a
quality component to something. You can't -- to make it better quality. You can't
add a modularity module to something to make it modular. And it's the same for
security. You can't add a security box and make things secure. We've seen lots
of attempts at doing that, and they have all failed. The way we make something
secure is by removing the insecurity, which is a different thing.
We see confusion of cryptography and security. A few years ago I was attending
a conference called the digital living room in San Mateo where Hollywood meets
Silicon Valley. And there was a group there called DNLA that was trying to
promote an architecture in which you could have things in your house all talking
to each other. You know, like the TV could talk to the computer, could talk to the
VCR and everything else so that you could have one remote control and each of
the devices could route the signals around and make everything wonderful,
which was nice.
But I went to the CTOs of one of the companies doing that stuff, and I told him
you've made it possible for an attacker to get into the computer and now take
over the home network. You know. So he can now change the channels. He
can turn the TV on and off. He can make you watch porn. He can make you
watch ads. He can track everything that you're watching.
You can have one device launch a denial of service attack against another
device. So everything that people have learned to hate about their computers
you're now making possible on the TV. And his answer was well, nobody would
do that. [laughter].
I'm really glad you laughed. So then he -- I convinced him that people would do
that. If only because you can't prevent it, there are people who will do it. But
unfortunately there are even worse motivations than that.
So he suggested well maybe we could have the devices authenticate themselves
to each other. So you know just add some crypto to it will somehow make a it
safe. And I explained to him why that wouldn't work. And he said well eventually
-- he was confident that they would figure it out. First they wanted to ship it.
We see a lot of confusion of identity and authority. If we know who wrote the
software, that's enough to know that what the software ought to do. That clearly
doesn't work.
We see a lot of blame the victim stuff where you have a system which is unable
to make good decisions as to what it should allow and what it shouldn't, and so it
says, let's ask the user but it always asks the user in a way that the user cannot
possibly answer correctly. This just doesn't work. But we still see and awful lot
of this.
But ultimately I think, the most fundamental thing is the confusion of interest. Let
me talk about what I mean by that. So if we go back to the beginning of
computing, go back to the '40s and '50s where computers were first coming into
existence, you had a box. You would put a program in the box. The program
would run. Whose interest is represented there? Well, generally the person who
is writing the programs is employed by the same guy who owns the machine, and
so we're all in it together. And so that's pretty easy.
But as the number of people who are using the machine increases and as things
like storage become available, which is persistent across user sessions, then it
becomes much more of a question. Early on it was discovered that the system
needed to protect itself from the programs that it was running. And so we had
the invention of user mode. And this turned out to be a really good idea and
continues to be a standard feature of CPUs.
So, you know, we don't trust this guy fully, we trust this guy fully, we don't want
that one to mess with that one. That turns out to be a good thing. But ultimately
not sufficient. And it became really obvious when we entered the time-sharing
era. We now have multiple users in the machine at the same time. We do not
want each of the users to be able to tamper with each other, and so the idea of
having a separate process and the processes are opaque to each other, so one
user can't mess with another. That turns out to be a good thing. Except that
sometimes the users want to cooperate. Maybe they have a collaborative
application in this process model if there's no way for them to do that. But even
worse, there's confusion within each of these processes about whose interest is
being represented.
So if I'm a malicious user and I want to get at that guy's account, I can't get at it
directly. So we have enough protection to do that. What I have to do is trick him
into running my program. Because when my program runs in his account, there's
a confusion of interest, and the system assumes that my program is representing
his interest. And so I get access to his files, and I can do bad things to him.
In the time-sharing era, they were just beginning to wrestle with that question and
realize, whoops, we've had this wrong from the very beginning. What are we
going to do about that? And we started to see some really interesting research
on that. And then it all fell apart. Because of the personal computer. The
personal computer destroyed the economics of software as a service or
computing as a software. So we went all the way back to bare metal.
And so you've got a program which is indistinguishable from the machine. And
that gets bad when we -- and it works very well at first and gets worse as we add
hard disks and floppy disks. Floppy disks become the medium of exchange for
propagating viruss and when we add modems and networks to it, it gets much
worse. So we have basically all the problems that we had in the mainframe era
but none of the protection. So the system cannot distinguish between the
interests of the user and the interests of the program.
This kind of worked okay when you had a lot of friction involved in the process of
installing new software. So it generally required the user pay some money in
order to get a program. It failed when software came with viruses pre loaded.
And that was a worry for a while back in the box software era. But it changes
now in the network era.
So this is an important quote. It is not unusual for the purpose or use or scope of
software to change over its life. Rarely are the security properties of software
systems reexamined in the context of new or evolving missions. This leads to
insecure systems.
The person who said that was me. I just want to go on record as having said
that.
So practices that worked in another time don't work when you go forward and
change the circumstances around it.
So on the web we have casual, promiscuous, automatic, unintentional installation
of programs. And it works. Because the interests of the user and the program
are distinguished. So the web has gotten something right that nobody ever got
right before in which other venues continue to get wrong. The web browser does
not confuse the interests of the user and the program that the user is running.
The browser successfully distinguishes those interests.
What it confuses is the interests of the multiple sites. This was something that
was not anticipated when the browser was put together. In fact when Netscape 2
introduced frame sets and JavaScript at the same time, it very quickly
discovered, whoops, it was now possible for the sites to contaminate each other.
Yeah?
>>: [inaudible] pop-up blocker sort of act as a [inaudible].
>>: When it was running code, the code was doing something that I didn't want
to my computer pop up in a new window?
>> Douglas Crockford: Yeah. That was a case of excessive authority being
granted. And that's generally been corrected now. So it's been a long road to
correcting this. But still the browser comes closer to getting it right than anything
else. So where the browser is most wrong now is that within a page the interests
are confused. So if an ad or a widget or an Ajax library gets on to a page, it gets
exactly the same rights as the sites on the scripts. And so they get all of the
capabilities -- since they get all the capability the site has, they can to anything
the site can do, including communicating back to the site and the site cannot
determine who's interest is being represented in that request.
So one thing that come indicates this is what I call the Turducken problem. This
is a map of the browser. So or the languages of the browser concept map. So a
Turducken is a turkey that's stuffed with a duck that's stuffed with a chicken. So
there's a lot of stuff being nested in there. And the web works the same way. So
you've got HTTP which holds HTML, which can hold URLs, which can hold
JavaScript which can hold styles, which can hold more URLs, more styles, more
script, so on. You can nest these things really deeply. And it gets really
complicated.
And so it's very difficult to analyze a piece of text and be confident that you are
not accidental or unintentionally injecting a piece of script into a page which is a
problem because if that script gets on the page, it gets all of the capabilities of
that page and can cause significant harm.
This is not a Web 2.0 problem. This has been in the web since 1995. All of this
came since with Netscape 2 and has not gotten any better since then. Well,
slightly better. We have pop-up blockers now. A few things have improved. But
for the most part, we're back where we started.
Except now we're mashing things up. For a long time there's been interest in
being able to take components from different places, stick them together like
Lego, and build applications really quickly and easily, having the powerful
capabilities of these things combined. And it's incredibly surprising that the
browser is the place where this actually works. JavaScript is the language which
is enabling this. And so it's not theoretically possible now, it is deliverable now.
You can sends it to billions of people, which is a wonderful thing. Except that a
mash is a self-inflicted cross-side scripting attack. Because each of the
components in the mashup now gets everything that's available to all of the
components. And what makes it even worse is that advertising is a mashup.
Advertising is the thing that's paying most of the freight now for the Internet. So
anytime an ad goes on to a page, that advertiser in exchange for his whatever
he's paying for his impression is also getting the right to attack the site, steal the
user's credentials, whatever he wants to do. There's nothing the site can do to
prevent him from doing whatever he wants to do. Which is a bad thing. So in my
view, we need to correct this before the house of cards falls apart.
So JavaScript is very much maligned in this stuff. They were constantly reading
reports about another thing that's wrong with JavaScript. But JavaScript actually
gets closer to getting the stuff right I think than virtually any other programming
language in the operation right now. Because JavaScript got so close to getting
it right, a secure dialect is obtainable. And there are projects like ADsafe, Caja,
the Web Sandbox that are leading the way of showing us how to do this. But all
these things have in common is a new security model. It's actually and old
security model. The object capability model. I highly recommend Mark Miller's
Robust Composition. He wrote it at -- where was he? John Hopkins. Really
good read.
So one of the things that we get from that model is cooperation under mutual
suspicion which means that each of the interests can be kept separately but they
can still interact with each other, which is exactly the property that we need in
order to do mashups. And also in order to insulate ourselves from cross-side
scripting attacks.
So let me quickly go through what object capabilities are. So we'll start with
objects. Here A is an object. Like any object it has some state and behavior.
And we have a has-a relationship between object A and B. All that means is that
object A contains a reference to object B. Having that reference A can now
communicate with B. It can send it messages or call its methods, whatever the
language allows. And B has an interface that constrains what A can do to it. So
A cannot get at B's private state, A can only send it messages through its public
interface.
Here A does not have a reference to C, so A is fundamentally incapable of
communicating with C. It's almost as if C has a firewall around it. A simply
cannot communicate with it until it gets that reference. Nothing new here. This is
just the way object systems are supposed to work.
So an object capable system is produced by constraining the ways that
references are obtained. A reference cannot be obtained simply by knowing the
name of a global variable or a public class. In fact, there should be exactly three
ways to obtain a reference. By creation, by construction, and by introduction.
So by creation means that if a function or a method creates an object, it gets a
reference to that object. Wouldn't make sense otherwise.
Number 2 is by construction. An object may be endowed by its conductor with
references. And these could include references in the constructors context and
inherited references that it wants the instance to get.
And three, the most important, is by introduction. So here we have A, which has
references to B and C. B has no references, so it can't communicate with
anybody. The same with C. A decides its too its advantage for B to be able to
communicate with C, so it sends B a message. And that message contains a
reference to C. And once that message is delivered, B now has the capability to
communicate with C. That's why this is called a capability model.
If references can only be obtained by creation, construction, or introduction, then
you may have a safe system. And if references can be obtained in any other
way, you to not have a safe system.
So there's some potential weaknesses to avoid. Arrogation, corruption,
confusion and collusion.
Arrogation means to take or claim for one self without right. And this would
include global variables, public static variables, standard libraries that grant
powerful capabilities like access to the file system or the network or the operating
system to all programs. Any language that allows address generation obviously
to rates arrogation.
JavaScript's global object gives powerful capabilities to every object. This is the
thing that JavaScript got wrong. And this is Caja and ADsafe and Web Sandbox
spending most of their energy trying to prevent. By removing the global object
from the programming model a safe language is obtained.
Corruption. It should not be possible to tamper with our circumvent the system or
other objects.
Confusion, it should be possible to create -- or one thing on this. In ECMA Script
fifth edition, we're adding new facilities to JavaScript which allow us to make this
statement true for JavaScript. So we'll have object hardening techniques which
allow you to make an object which becomes impervious that you could then hand
to suspect code and be confident that it cannot damage the object.
>>: [inaudible].
>> Douglas Crockford: Ask me later. Confusion. It should be possible to create
objects that are not subject to confusion because a confused object can be
tricked into misusing its capabilities.
And finally collusion. It must be possible -- or it must not be possible for two
objects to communicate until they are formally introduced. If two independent
objects can collude, they might be able to pool their capabilities and cause harm.
One of the things we get from the capability model is rights attenuation. Some
capabilities are too dangerous to give to guest code. So we can instead give
those capabilities to an intermediate object that will constrain their power. So I'll
show you in a moment an example of how we do that.
Ultimately every object should be given exactly the capabilities that it needs to do
its work. Capabilities should be granted on a need-to-do basis. So where you
used to think about information hiding, you now think about capability hiding.
Intermediate objects or facets can be very lightweight. And class-free languages
like JavaScript are especially effective at implementing these.
So here we have a facet that's going to be limiting a guest object's access to a
dangerous object. So the guess doesn't have a reference to the dangerous
object, it has a reference to the facet. The facet sits in between them, can
monitor all the traffic going back and forth and attenuate whatever powers should
be granted to the guest.
One of the complaints about the capability model is that references are not
revokable. Once I give a reference to an object, I can't ask it to forget that
reference. I could ask it, but I can't depend on it obeying. So I have to assume
that once I get out of capability that capability is granted forever. And some
people think that is evidence that the capability model can't work. But it turns out
using facets that's easily gotten around.
So here I've got a guest which will make a request to an agency object for a
capability to interact with the dangerous object. And what it gets instead is a
facet, a reference to a facet, which will mediate as before. But the agency
retains a capability to the facet. And at the time that the agency decides it wants
to revoke the capability, it sends a message to the facet saying go inert. And the
facet does that by simply forgetting its reference. So the guest still has a
reference to the facet. We can't revoke that. But the facet is now not useful to it.
And so effectively we have revoked it.
We can also have a facet mark requests so that a dangerous object can know
where the requests came from. So allows to us do tracking and accountability.
So facets are great. They are very expressive, they are easy to construct, they're
lightweight, they allow us to do power reduction or attenuation, they allow us to
have a form of revocation, we can do notification, delegation, quite a lot of
powerful patterns that come out of a really simple pattern.
It turns out the best object oriented programming patterns are capability patterns.
We found that when we're designing just programs in general and we're faced
with one of these things where you could structure something this way or that
way, which is the right one and that struggle, asking which of those makes the
most sense as a capability pattern you always pick the right one. It's amazing.
So it turns out good object capability design is good object oriented design. The
DOM unfortunately got much less close to getting it right. But the Ajax libraries
are converging on a much better API, which is good. But ultimately I think we're
going to need to replace the DOM with something that's more portable, more
rational, more modular and safer. We need to replace the DOM with something
that's less complicated, less exceptional, and far less grotesque.
W3C I think is moving in the option direction. So I'm thing that HTML5 needs to
be reset. Or I think W3C needs to be abolished and we need to figure out
another way to get this done.
So the advertising problem is serious. We found that it's difficult to go to the
advertisers because the threat is that they'll pull their money out and take it to
someone else. But everybody else they might give it to is equally at threat. So
we need to do this together. Everybody who's in this business needs to go
together to the advertising industry and say we need to fix this, we need to put
controls on ad quality in order to protect all of our interests, including theirs.
One technology which could help to do that is something I've been working on
called ADsafe. It takes the capability model and takes a minimal approach to
applying it to protection of advertising ADsafe is a JavaScript subset that adds
capability discipline by deleting features that cause capability leakage, and it
does it statically so there's no runtime performance penalty. The language is
very much constrained. For example, you can't use the this variable, and use of
some other things are greatly restricted. Probably the most annoying thing in the
language is that you can't use the subscript operator, you have to use methods
instead to pull things out of an array.
But ADsafe relies entirely on static validation. So it does not move the code in
your widget, which means that debugging is much easier because you're
debugging the original code, you're not debugging something that's been
transformed. There's no performance pent. Validation can be done at every step
in the ad pipeline from the creation of the creative all the way to post-consumer,
post-mortem analysis. Currently the validator is implemented in JSLint, which is
a code quality tool for JavaScript. So you get an extra bonus that code will be
clean when you get it through as well.
One of the bad things about ADsafe is it's extreme unlikely, I'd say virtually
impossible that any existing code would run -- be approved by ADsafe without
modification. For one thing, they're very dangerous and popular practices that
are not allowed, for example, document.write which is used abusively by the
advertising industry.
Also, ADsafe cannot protect the widget from the page, it can only protect the
page from the widget. So in the case where you've got evil website operators,
ADsafe will not by itself will not be a sufficient mechanism. Although perhaps
ADsafe with ad frames or other mechanisms might.
ADsafe also provides a DOM interface, one that's lightly weight, query-oriented
and it scopes the queries to be strictly limited to the contents of the widget's div.
So a guest code cannot get access to any individual DOM node. That's
important because in the DOM every node is linked to every other node, which is
linked to the document, which gives you the capability to go to the network. And
once you get to a network you can send data to any machine in the world, and
you can receive additional scripts from any computer in the world. So that's a
dangerous capability that we don't want to give to ads.
This is an example of a widget just to show what the pattern is. It's not a
complicated template, you just plug your code in it.
I think we've gone as far as we can on luck and good intentions. I think a very
long last we need to get it right. We've not done much in the last 15 years to do
that, but I think now we can. So thank you and good night.
[applause].
>> Helen Wang: We have some time for questions.
>> Douglas Crockford: So we have some time for questions. Yes.
>>: So what are the sort of the market [inaudible] do you think that you're going
to push us [inaudible]? I mean, I agree that, you know, JavaScript and HTML5,
they [inaudible] right, I can't sleep at night because of this kind of stuff. But I
think the problem is so JavaScript has a lot of ugly features that -- it's very easy
to write stuff very quickly without a lot of discipline, right? Most web programs
are not disciplined. And I think the HTML spec is exploding is because people
think it's awesome that I can [inaudible] video, and like that's what's driving this is
awesome, I want this feature. And I would argue that to a certain extent people
accept the fact there will be security vulnerabilities [inaudible] this sort of
[inaudible] cycle and people [inaudible] in the market it's willing to pay that price
for these new features. So I guess I'm wondering, you know, if you have this
more restrictive, you know, ADsafe language or what not, it seems like some of
the whiz bang features that made Web 8.0 applications also are not going to be
as easy to deal with. So what is it in the market that's going to make us move
towards this [inaudible].
>> Douglas Crockford: I completely agree with the points that you made. What
would I do? Well, that's why I'm here. The way in which we create web
standards, that whole process is totally broken. And I don't have a lot of
confidence that it's going to get fixed. So somehow we need to get above all of
that noise and work collectively somehow to find a better way forward. And I
think the place to start is with the advertising network because we all have a
huge investment in keeping that system alive. And I think it's teetering, and we
can't afford to have it fail. It's all that my company is, it's becoming and
increasingly important part of your company and a lot of other companies.
And so right now the circus is being run by the browser makers, and we can't
tolerate that anymore. We need to take it over somehow. So I'm here to try to
create some consensus around the idea that we need to find some other path
forward because the current one certainly isn't working.
>>: [inaudible] meet the standardization or ->> Douglas Crockford: Yeah, the W3 process that is never been responsive to
our problems and is just getting worse. It's out of control.
>>: The browser [inaudible] guided by [inaudible].
>> Douglas Crockford: No. They are actually driving. It's the browser makers
who are -- W3C has lost control of its process.
>>: [inaudible] [laughter].
>> Douglas Crockford: There may be someone else somewhere [inaudible] yet.
Yes?
>>: So what is the adaptation story for [inaudible] we use it, [inaudible].
>> Douglas Crockford: Current ->>: [inaudible].
>> Douglas Crockford: Currently there is one person in the world who is using
ADsafe. That is Tyler Close at HP Labs is doing some really interesting research
using ADsafe as a delivering mechanism. So, no, the advertisers are not using it
yet. No one is using it.
>>: [inaudible].
>> Douglas Crockford: Because they don't have to.
>>: One of the reasons why the -- excuse me. One of the reasons why market
forces aren't working here is that the economic incentives flow in the wrong
direction. So the advertisers many years ago after a lot of struggling and a lot of
fraud and a lot of bad stuff settled on a standard way of doing things. And then
fossilized and so since -- for 10 years there's been no progress in the way that
ads get created. And so as our applications get increasingly sophisticated and
as we're getting more mashupy and all this stuff, assumptions that were made 10
years ago are no longer holding. It's like I said before, as things change old
things no longer are as safe as they had been. And so you would like to be able
to say hold it, Mr. Advertiser, I just can't accept your thing anymore. It says I'll
give you some money. Oh, okay. So because basically they're paying us to look
the other way. Would we have this strong senses that we will never allow any
third party to put anything on our site which causes any compromise of the safety
or security of any of our customers or ourselves unless someone can meet our
minimum CPNs.
It's not just us, it's everybody. Everybody is doing that. And because, you know,
it's all business and it's important that the money keep flowing and so we need to
do this in a way which doesn't disrupt that. We're trying to do this in a way which
ensures that it can go forward and that's -- we're in this bad spiral, and we need
to direct it somehow.
>>: Do you see any light in this case or [inaudible] sacrifice ->> Douglas Crockford: I wouldn't be talking about this if I didn't think we could
make it better. So while all this sounds kind of bad, my real message is that we
have the technology to fix this stuff. We have the theory to fix this stuff. I don't
think it's going to be all that difficult to apply it. We just need the will. And that's
the thing we're lacking at this point. If we can find the will to correct all of our
systems, to correct all of our sites, we can make things better for everybody. We
can reduce the fraud. We can reduce the uncertainty. We can make it better.
It's worth doing. Yeah?
>>: In one of your slides as part of the capability model for security you
suggested to remove argument that [inaudible] for similar functions that are
[inaudible] to access the stack or other things. What would you provide as a
substitute which allows us to be more flexible, have that functionality but also be
safe? Because it seems like it's more limiting the language that it may limit its
use.
>> Douglas Crockford: It's my feeling that a program has no business looking at
the stack that's not a capability it should be entitled to have.
>>: [inaudible].
>> Douglas Crockford: Debugger is a different thing. And there are different
affordances for that. I don't think that needs to be given to every application.
>>: [inaudible].
>>: Well, there's the big [inaudible] stack. So we could debug tracking, you
know. The big problem with the [inaudible] web is if something fails. You know
[inaudible] it's going to do a dump of the stack. At least in Windows Live it
removes all PI information before we put the information back in the server. And
that is where it's actually quite useful [inaudible] across the network by
[inaudible].
>> Douglas Crockford: And so one mechanism I'd like to see in the browsers
would be some kind of module container. Iframes come close to it, but they're
not good enough. But something where I can put some code in there and give
that code the capabilities it needs and I know that it cannot attack any other
frame or any other module. And such a thing should have the capability of
sending its own stack back to its own site. I can see that as being a reasonable
capability for it to have. But that's not something that I want any piece of code to
be able to do.
>>: [inaudible].
>> Douglas Crockford: Allow and application to send its own stack back to its
own server. I don't think that's dangerous.
>>: [inaudible] figure that out [inaudible].
>> Douglas Crockford: Yeah?
>>: I'm curious to what your opinion is of Facebook style of doing JavaScript, in
a way installing the same problems as ADsafe and likewise how is it just going
about it definitely or ->> Douglas Crockford: Yeah. So you're asking about FBJS. So there's some
very good work done in FBJS. And it's attempting to solve a very similar
problem. Facebook has decided to abandon that. So they're not going forward
with that. They'll probably be adopting one of the others, I would expect.
>>: [inaudible].
>> Douglas Crockford: Probably Caja, since that seems to be the most popular
at the moment.
>>: [inaudible].
>> Douglas Crockford: Yeah?
>>: So if you were to ask [inaudible] one thing that [inaudible] do anyway, what
would ->> Douglas Crockford: The module. You know, modules are so important in
software systems. And we've known that for, what, 30 years at least. Well, 50
years. They're essential. And the browser doesn't have any kind of module.
There's no good way of making a big thing out of a bunch of also things and
putting a membrane around them and protecting it. So if I could have only one
thing, it would be that.
The stuff that Google's been doing with gears I think is a step in the right
direction. I like that they've got workers and other things which each exist in their
own process which are then able to communicate. I like that. I'd like to see us
moving more in that kind of direction.
>>: [inaudible] those worker [inaudible].
>> Douglas Crockford: Right. They can't touch any DOM. So ultimately what
we want is to be able to give each of those things its own rectangle and say you
can draw in your rectangle, but you can't go anyplace else. Currently we don't
have any way to specify that.
>>: Actually [inaudible] proposal is very close to [inaudible].
>> Douglas Crockford: Yeah. I don't claim any originality in the stuff that I'm
saying, I'm just describing what I think the solutions are. Yeah?
>>: Have you dealt with anything like as far as validating JSON's structures
beyond just syntax [inaudible] only arguments I get for XML is we want to be able
to validate this configure file and somehow give you like a schema or something
to validate it, slightly user [inaudible] offset here, this character is wrong.
>> Douglas Crockford: Yeah. When I started publishing the JSON stuff the,
when Ajax became popular, I designed a schema system for JSON. And I found
there were sort of two classes of people. There were people in the XML world
who didn't want to use JSON who were looking for excuses, then there were
people from the XML world who had made the transition to JSON who didn't
need it. And so I decided not to implement it. Other people have gone ahead
and done stuff, like Chris Zipe [phonetic] at Dojo has done some very good work
on schemas for JSON.
I'm -- I don't like the idea of using schema to do validation just because I think it
delegates too much responsibility for correctness for something which doesn't
have enough information to do it well. Ultimately every application needs to be
responsible for verifying its own inputs because only it has the knowledge and
the context to understand what all the data means and how it interrelates. And I
think one of the bad confusions of the XML schema model was that you don't
need to do any of that stuff, it's all done for you. And that turns out to be
fraudulent. And I don't think that really works.
I think we have time for one more. Yeah?
>>: So [inaudible] and there's been some pushback from [inaudible] for patterns
like that for performance. And you find that has zero performance overhead. So
in general like this [inaudible] program [inaudible] model but it actually [inaudible]
threats will you start using these runtime patterns and what I'm wondering is what
are your thoughts [inaudible].
>> Douglas Crockford: Performance of these things is really good. We had tried
doing the stuff in Java, and we found because of the way the class model worked
that basically we'd have a different class for every facet. And that turned out to
be horrendously expensive. But working in JavaScript most facets turn out to be
a single function. And functions are really lightweight in JavaScript. And adding
an extra function into a call path to something is in the noise. So we found it was
a really good tradeoff. JavaScript is ->>: [inaudible].
>> Douglas Crockford: Only for the ones that cross a trust boundary.
>> Helen Wang: Okay. Great. Thank you for the great talks and the great
discussion, Doug.
>> Douglas Crockford: Okay. Thank you.
[applause]
Download