>> Karin Strauss: Good morning everyone. My name... at Microsoft research. Today it's my pleasure to introduce...

advertisement
>> Karin Strauss: Good morning everyone. My name is Karin Strauss and I'm a researcher here
at Microsoft research. Today it's my pleasure to introduce Doctor Robert Grass. He is a senior
research scientist at ETH Zurich. He did his undergrad, PhD and postdoc at ETH Zurich as well.
His PhD work is pretty cool. It's on magnetic beads that ended up resulting in the creation of a
company that commercializes the technology, and today Doctor Robert Grass will tell us about
his pretty cool work on making DNA storage more practical by making the DNA itself more
stable over time. So with that, thank you very much for coming.
>> Robert Grass: Thank you very, very much for the kind introduction and also for hosting me
and having me and inviting me. It's a big pleasure to be here and to talk about our work. So, as
Karin already said, I started off working with magnetic nanoparticles and I'll tell you a little of
the story how we got to the topic we are working at because it starts with magnetic particles.
You probably know magnetic particles or magnetic materials are the basis of our current
storage devices, at least tape and hard disk are magnetic particles. We use magnetic particles
for something different than for separation. If you put particles in a liquid and you place a solid
magnet you can do separations in liquids. This technology was spun out of our university by the
company called TurboBeads you, for example, distribute our materials via Sigma-Aldrich, one of
the large chemical suppliers.
I’ll get back to the topic of magnetic particles in just a moment. Let's say the vision we want to
talk about today is the idea of using DNA as a data storage device. More or less all of us use
hard disks or solid state disks to store our data. We all know, somehow we know, we try to
forget it as good as possible that our data is not stable on our devices at least as consumers.
Before I started with this project I never thought about data stability on my own personal data.
The devices we use are made for 10, 20 years of storage. We accept somehow, in my case all of
private photographs and everything I have on hard disk with the idea that they'll be there
forever, but that's not true because these technologies were not made for long-term
preservation of data and so we are looking at other alternatives and DNA is certainly one of the
alternatives that we will discuss in a moment why it's really interesting to use this in long-term
data.
We didn't start with this problem. This problem is relatively difficult and we'll get closer and
closer to this problem throughout the talk. I started with a different problem. I started with
the problem of brand protection or anti-counterfeit or product barcoding or whatever you want
to call it. We started with this because you can also use magnetic particles for these purposes.
So if I have a liquid or a product you can add some magnetic particles in a certain pattern and
further downstream you can image that pattern or that other way of identifying the product.
And the competing technology to what we did was DNA. And as chemists we’re extremely
fascinated by DNA because as chemists and if you do analysis of some molecule of some
species you need enormous amounts of that molecule so that you can detect it, and usually you
can only detect the concentration of the molecule; you can't get more information of the
molecule.
But DNA it's really different. DNA by biochemical methods especially PCR, Polymerase Chain
Reaction biologies can detect one single molecule of DNA in a microliter of water. In chemistry
that's completely unthinkable. And besides being able to detect one single molecule he can say
the nature of the molecule in terms of the sequence of the molecule by sequencing it, for
example, after he did the PCR. So DNA is really, from a chemical perspective, unique in the
amount of information that you can put into let's say one individual molecule undetected.
And this idea of DNA barcoding there's a company called Adonis. They sell you used DNA and
you tag various products with it. You put some DNA on your computer and if somebody steals
it you can always retrieve the DNA back out from your computer and say that was my
computer. But there's a problem with that idea, and the problem with that idea is that DNA is
not so stable as we would like to think of or at least is not as stable as the company would like
to think that DNA is. DNA is everything and it decays relatively quickly.
So in 2015 Nobel Prize was given to these three gentlemen, Lindahl was one of them, we’ll
discuss his work in a moment; DNA is not stable. In our body DNA is under constant attack so in
every cell, probably every second, there's some decay in DNA; some error is introduced and our
body has a very elaborate repair mechanisms with which DNA is constantly repaired. And these
three gentlemen received the Nobel Prize for identifying some of those pathways on how DNA
is repaired in nature using enzymes and very, very complicated systems.
In the environment we can say the stability in a lab or in an environment is at the absolute
maximum one year. So then if you go and say if we use for data storage or any of these
barcoding technologies it’s not extremely useful because I can tag my computer but in a year
the tag is destroyed and this is useless.
Another side, if you go to the movies I'm sure you're aware this is movie Jurassic Park where
they say you have this amber fossil in which there's a mosquito and in the blood of the
mosquito there's DNA and that's the DNA of some dinosaur and we can say let's say remake
that dinosaur from the DNA. Really interesting story, after working with DNA it's really
interesting to look at the movie. There's a long sequence where they explain how the DNA
ends up here, what it really is because often it's very difficult to tell but DNA is, how it works,
how the sequence defines what we are. The movie is really well done. The movie makes some
let's say assumptions which are not true. The assumption is, one is that there's really DNA in
the mosquito which is wrong.
So amber fossils they are usually, certainly more than 10 million years old, they're usually
formed where it’s underground where it’s under pressure, usually a little warm, DNA doesn't
survive this. So there's no DNA in mosquitoes in amber. But there are old DNA samples in
other fossils. Like this piece of a bone that’s a piece of horse bone, foot, 700,000 years old. It
was found somewhere not so far away from here I think, at least from my perspective, near
Alaska somewhere where it's really cold underground in the permafrost there's this 700,000
year old bone and from that people were able to extract DNA in that age and more or less
sequence the whole genome of that horse, half a gigabyte of information after 700,000 years.
In my idea that’s the oldest information we really have access to. There's nothing similar to
that and also that amount of information; it’s not just some cave drawings. It's really an
enormous amount of information that survived enormously long amount of time. So for me
that's the interesting part. So in the lab DNA is not very stable and if you put it on the table it's
also not very stable but if I put it in this bone somehow if it's cold as well, that's also an
important property it’s very, very stable. But there's also more modern that's from 2013, these
papers 14 and 15 in Spain somewhere in a cave, there's human DNA more than 100,000 years
old and that’s not in permafrost. So if it's correctly encapsulated correctly in this bone.
somehow DNA is really stable.
If you look at it chemically, why this DNA decay? DNA decays mostly by reaction with water.
Mr. Lindahl, he had a look at that in the 70’s or so and described those reactions with water and
radicals which happen in water. So if I want to stabilize the DNA I have to get rid of water. One
idea would be amber. We even tried to synthesize amber around DNA. It's by far not very
[indiscernible] so we were not successful with that. Bone in bone, so if you have a piece of
bone lying in the ground the biology decays with time, to be eaten by bacteria, and depending
on the conditions on which that bone degrades the calcium phosphate, so the inorganic part of
the bone recrystallizes. During the recrystallization it might just encapsulate some DNA. By the
way, DNA is very strongly negatively bound. Calcium is a very strongly positively bound ion
which binds to the surface of the phosphate and so you can imagine this calcium phosphate is
really captured within calcium phosphate crystals.
The problem is that calcium phosphate is not very stable in in a liquid environment. So you
have it in water, especially if you are making it more acidic or more basic, the calcium
phosphate will dissolve and your solution is useless. So we went and said we want to
encapsulate DNA in glass which at the beginning might sound strange because glass you make
at very high temperatures, glassblowing 500 degrees at least probably 600 degrees, and DNA
will not survive that. But in the field where I come from, nanotechnology, there are methods to
synthesize glass in water. So I can have a water solution, I add glass precursors which are
chemicals containing silica, and they slowly polymerize to form a glass, slowly meaning it takes
perhaps a few hours until a few days until this reaction of these precursors to form really glass
work. So there we can build glass at a few let's say nanometers per day and that's the method
we developed to show you some chemistry. I'm sorry for that.
So what we start with we start with a glass particle which is positively charged. I've already told
you DNA is very strongly negatively charged so it will add [indiscernible] on the surface of our
glass particle. So we have glass and then DNA and then we use this species you see here which
has a positive charge on one side, let's say a silica glass precursor on the other side. So the
positive side will again bind to the DNA leaving the silica side on the outside and then we use
more of a second glass precursor, this one, which then grows solid glass around material. So
this glass forming is more or less a polymerization of this molecule. It loses the organic parts
and you just get silica which is glass.
>>: [inaudible] stable putting in it. So what do you mean by instability? So how much damage
does it actually>> Robert Grass: So we'll get to that in a moment. So usually we express damage as a halftime
of DNA strand of a given length. We work with lengths of about 100 base pairs. So in the lab
let's say we talk to a biochemist and say how stable is DNA? He will say well, if you have it lying
around in a lab weeks, months, even in the fridge perhaps a half a year and if you freeze it it will
be stable perhaps for a few years.
>>: [inaudible] one base is wrong.
>> Robert Grass: Well base is wrong so if you have DNA decay, let's take this example here;
usually you have a cut through here and then it breaks apart. So you have a single-strand break
and then you have only the single strand left and then if you do amplification of the DNA you
lose that signal. So in all of the analytical purposes we need complete DNA sequences.
So from an information perspective it's an interesting point. The information is not lost; it’s
just much more difficult to analyze it. So you can also calculate the half-life of one base pair.
So the half-life of that is around I would say in solution perhaps 100 times longer than what I
calculate for my hundred base pairs sequence so information is lost very slowly. Also if you
look at these bone fossils DNA there has are already gone through decay after 700 years. Of
course it's decayed. It's not right that you have a perfect complete genome or chromosomes of
that horse. They have been cut down into very, very small pieces where you have all of these
individual decay patterns at the end of the pieces that have started to oxidize and so they more
errors in the end and so as we get better and better at reading information of shorter and
shorter pieces the information is not lost. But for practical constraints if we didn’t go to how
we currently read DNA by sequencing it’s most useful if you can keep the complete lengths of
the DNA.
Also because all procedures we are using at the moment for reading DNA you have to amplify
the DNA before you read it, amplifying by PCR, and by PCR you have to have primers which bind
to both ends to the DNA and for that it has to be complete. So if it's broken somewhere you’re
no longer able to amplify it and then it’s lost during your procedures. But from purely the
information perspective the information is not lost but it’s just much, much more difficult to
retrieve it.
And if you go to this you can have a look at the number of [indiscernible] in one of these
papers. That will give you an idea how much work it is to get that information out of there and
how much money it is that goes into that because sequencing was already cheap by that time
but they need a lot of sequencing to get that because what we forget in these old samples
they’re only looking for the horse DNA. You have bacteria, viruses, everything that's in the
earth, everything that's been there over time you’re also sequencing all of that and you have to
filter that out of the results. So these are very difficult tasks to get that information. But it's
there, the information I think. That's the main information. And I think the second thing is DNA
we can improve things by far if we get rid of water in our systems, so if we get DNA separated
from water. So this is what this>>: Can I ask a question?
>> Robert Grass: Yes, please. You can constantly>>: So how much control do you have over the dimensions of these silica beads here?
>> Robert Grass: So there's different parts of control we have. We can make the starting
material in various sizes. So we usually control the size of the final particle and then we have
control on how thick we make the layer. It depends how much of the chemicals we add to
make it thicker or thinner. In all of the data we've collected so far it does make a big difference
how thick or thin it is because as soon as water is gone the decay of this is unlikely, at least, at
room temperature. And you if you go to a higher temperature this is only 10 nanometers thick
or if it's 10 or 20 it just is [indiscernible].
So we are doing work at the moment in getting other layers around here, for example one thing
we are looking at that this is glass, this is transparent to light, so you can always also add
another layer of titanium. Titanium is in your sunscreen. Sunscreen is just titanium. There’s
nothing else in there than titanium. And we can put a titanium layer around it which also
[indiscernible] absorb the light and then you get protection from light as well.
>>: And a follow-up question, [inaudible], you talked about this last night. You said that the
current overhead given the dimensions of the [inaudible] you have is about two orders of
magnitude. So it seems like the larger you can go here the better because [inaudible]
overhead>>: Robert Grass: Not quite right. The smaller you go the more you get in on the mass basis
because you get more surface area. So, per mass>>: I just said that.
>> Robert Grass: So this is [indiscernible] the most scientific way to do it because we are going
for [indiscernible] and we want to show that we have this nice material we make with its, nice
well-controlled material. You can also go and just take DNA, snip this away at DNA and that
chemical, until you get like 50-50 weight of DNA on>>: You put the DNA inside it as well but you can't grow multiple layers in there.
>> Robert Grass: But you can get that DNA directly into silica without the need of the core.
And then from a mass basis [indiscernible] extremely interesting>>: [inaudible].
>> Robert Grass: Yes, exactly.
>>: So what are you trading off when you do that?
>> Robert Grass: Nothing. It’s just the beauty of the image I get afterwards. Don’t trade off
anything by that. For us also we go for cost. The chemicals all of that is essentially free
compared to the cost of the DNA so we don't care on how much we add. Of course if you're
going to say DNA has the advantage of having the high storage density we’re then trading that
away by just having this stupid core in the middle just for the nice picture.
>>: The core could just be DNA.
>> Robert Grass: The core can also be DNA. It's really having this layer of DNA sandwiched by
excluding water.
>>: Have you tried multiple layers?
>> Robert Grass: We've done exactly, as I said, silica titanium and now we're looking at other
materials because>>: Multiple layers of DNA-
>> Robert Grass: Of DNA, yes we have, but for different applications we've done that. So you
can do multiple layers. You can go pretty far with that in let's say and that of course increases
density tremendously if you make additional layers around the individual [indiscernible].
>>: What sort of a physical form can you get when you mix DNA skipping glass beads? Is it kind
of flocculent?
>> Robert Grass: Yes, more or less you get the flocculence depending on the length of the DNA.
So these are relatively slow processes but you start with a solution of DNA and your metal
organic components, you wait, and at the end you have a wide [indiscernible] in the sample,
you [indiscernible] and it's like a flocculant of glass which you get. Any more chemistry
questions?
So that’s the material we get. You can see here this is the individual core. See the thin layer?
We have to be extremely careful if we do images like that. You usually get artifacts from
electron microscopy. We don't know if it's really the DNA there but does sequence we get is
correct from what we have. That’s just a bunch of particles. You have millions. They're really
small. They're just 100 nanometers. So in a gram of particles I think you have 10 to the 17
individual particles.
The challenge however, if you look at the chemistry part, is not getting DNA into the glass
material. The real challenge is getting DNA out of the glass because glass, we as chemists we
love glass and amongst everything because it doesn't react with anything. It's really not to
[indiscernible] all of chemistry we know. Temperature, radicals, you can show really ugly stuff
with glass. There's only one exception which is hydrogen fluoride or fluoride chemistry which is
feared by chemists to a great extent because it's very toxic at high concentrations. But it’s used
in the cheap manufacturing. In cheap manufacturing you work with silica and silicon and you
have to make planar surfaces and you always use fluoride solutions to etch off surfaces. So it's
existing technology; it’s just in the lab we are afraid of it and it's an asset people usually are also
afraid that it would decay DNA.
The nice thing if you apply it correctly, meaning that you buffer the asset by some ammonia and
you do it in a low enough concentration so it's no longer toxic, you can work DNA. So if we take
these glass particles, so this white powder, this precipitate, and we add some buffered oxide
etch in a concentration which has the same fluoride concentration of your toothpaste at home
or your mouth wash, it's nothing different, mouthwash is also a buffered solution of fluoride,
you decay the silica and you get out the DNA. So within seconds all of the white stuff
disappears and you get back to a transparent solution of DNA and you can then directly analyze
the DNA afterwards.
So more or less what we did we have a procedure to take DNA, put it into something where it's
stable for some time, and then if you decide to analyze it we can more or less push a button or
add some chemicals, DNA comes out of the stable format and you can go with the pre-existing
analytical means. One important question at this point is, how do we actually know that DNA is
inside and not outside because we have these pictures but pictures always lie. So how do we
show that it's really inside? And we can only do that by testing the properties of the DNA. So
what we do is we take radicals. DNA is very quickly decayed by radicals so we more or less take
hydrogen peroxide and some copper. Yes?
>>: Why not label the DNA or use ion exchange to stick gold or platinum in there?
>> Robert Grass: To?
>>: To stick gold or platinum in the double helix when you stick it on the glass and can see that
with the SEM[phonetic].
>> Robert Grass: Yes. It would not be so trivial to do that but that's certainly a way of>>: You could just use a labeled primer and have a big old thing hanging off the end it's true
that it would make one of your pictures ugly by having bumps on your sphere, but you have
bumps on your sphere and that would [inaudible].
>> Robert Grass: I have to admit this is scanning electron microscopy. So your primary is
probably this size so you don't see a lot. It's really, really difficult because we're looking at the
single layer of DNA and resolution close to a nanometer so it would really high resolution TM or
atomic force microscopy to release it and it’s inside the sphere and so if you do an image from
the top it’s very difficult to distinguish. So it's really difficult to talk to many people in doing
that so we have at the moment no direct proof that it's inside,
>>: I don't want to pick at this too much, but if you had a gold labeled primer where your gold
particles [inaudible] nanometers or something like that, then it would deform your sphere
because it would still be inside.
>> Robert Grass: But then it would deform my sphere. Yes, it would still be inside. Then I have
to guarantee that the gold also goes inside because I only get inside what's negatively charged.
So it's certainly possible. Personally I prefer the indirect proof where we take radicals which we
generate, we throw them at DNA, if DNA will be on the outside like here, this is a DNA ladder,
we add our radical solution of free DNA, in a few minutes it's all gone. If we have the same DNA
in our particles the DNA survives. We then get rid of the radicals, open the particles, and you
see the intact ladder after times much, much larger than the time we stressed on the free DNA.
>>: So I guess this is proving you have DNA inside. It's not proving you don't have DNA outside.
>> Robert Grass: I don't prove that I have DNA outside. That’s true. It also proves that the
glass I'm forming is nonporous because radicals are extremely small. They would get through
the smallest tiniest nanometer hole and with this I can prove that it's really intact. We can also
use PCR to really quantify that. So that's the free DNA concentration which drops orders of
magnitude after this radical treatment and I have encapsulated it it's essentially stable. You
could think that the loss you have a from here to here might be some DNA on the outside,
some that’s imperfectly covered, but it's a tremendous difference to what you have to the free
DNA. So, as I said>>: Could you please go back one?
>> Robert Grass: Yes, of course.
>>: So it looks like most of this is not free. Is that right?
>> Robert Grass: Most of it’s not free. Yes, exactly. So most of it's really>>: Just like one part in 10,000?
>> Robert Grass: So the concentration is difficult. We just take an arbitrary concentration at
the beginning, we add the radicals, and then more or less we cut loose factor two of the DNA
after the treatment. The treatment is really the most aggressive extreme you could find and we
really compare it to the free DNA in solution where the concentration just disappears
completely after a few minutes. We have similar data for temperature, light.
>>: I see. This is radical.
>> Robert Grass: So this is really by radical treatment to show that it's inside.
>>: So am I interpreting it right that there's the same amount of DNA?
>> Robert Grass: The same amount of DNA we start off with, one’s free in solution and one’s in
my particles and then I add the same chemistry to both. The one in solution just disappears
very quickly and the one encapsulated is maintained at that very aggressive chemical condition
for a certain amount of time.
>>: So the one that’s contained after the radical treatment then you do the fluoride?
>> Robert Grass: Then I do the fluoride. So I first get rid of the radicals. I can do that by
complexing[phonetic] the copper and by [indiscernible] I can collect the particles and I get rid of
the radicals. I open the spheres and then I can measure the concentrations.
So, as I told you, we didn’t start with DNA data storage. We started with this idea barcoding
more or less with what I have here. I have a much better way of barcoding products because
the idea is you have these particles which are synthetic DNA sequence inside, let’s say you have
100 base pairs, you get 200 bits of information in there, something like that, and you can
barcode it. And you can put that onto any product you want and then later downstream
recover your particles from the product and analyze that was that product and now they're
much more stable than if we just used the free DNA. The principle I think it's pretty trivial. The
advantage of this tracing technology is that it's very cheap in implementation into the product
because we need very low amounts of DNA. Even if DNA is expensive the amounts we need are
so low that it's not expensive in let's say bulk. The disadvantage is that it's pretty timeconsuming and work consuming on the analytical side. If you have a cell phone which you put
to your device and say save my device and say yes or no but at the moment you more or less
need a [indiscernible] lab to do the analysis.
Those exist if you go to the doctor and say I'm sick, here take some blood, and he'll do the
exactly the same work. He'll extract the DNA from viruses in your blood, do PCR and say okay,
you're infected with this or that virus so it's exactly the same idea behind it. We've done a few
let's say small projects. We've labeled olive oil with our particles by our Italian PhD student. So
she labeled some Italian olive oil and other oil products as well, gasoline; we can also show that
we can discriminate different concentrations of those traces in the solution because with PCR it
can be quantitative. And it also shows the concentrations we are looking at so one of
microgram per liter of oil so that's a small heap of material you don't even see if you put on, so
microgram is essentially nothing which only contains one nanogram of DNA for one liter of oil.
One nanogram of DNA is perhaps not even a cent and that gives you an idea of how cheap that
technology can be just because it works at such low>>: Is this taking human skin? If you are to explain it to someone how long would they last?
>> Robert Grass: If you sprayed it onto someone it will last for quite some time. There's ideas
in that, not from us, but using DNA as an intruder protection system like spray on DNA.
>>: [inaudible]?
>> Robert Grass: Well, we would have to play with the surface of the particles so that they stick
better because silica, once you wash yourself, you will get it out pretty quickly. Soap is pretty
good at getting rid of small particles. You could improve it by making the particles like we did in
olive oil; make them hydrophobic so they stick better to surfaces or [indiscernible] in the oil. So
you could get, I didn't test it so I can't give you a precise>>: You could put a covalent linker on the end.
>> Robert Grass: You could also put a covalent linker which would really react just to skin with
some [indiscernible] in the skin or so that would also be a very nice idea indeed. There are a
few products we have been looking at actually for protection in bills, so in money which is a
challenge because the industry is going more and more to plastic money, not to credit card
money, but the bills are no longer fabric but polymer and the bonding is much more difficult to
the plastic then to fabric. So those are things we are looking at that at the moment. We've also
added it to milk. For example, in Switzerland where they make cheese out of the milk, and we
can show that let's say the cheese has been identified with these things. If you look into that
more deeply especially in food fraud where we see that there's a big need in finding a solution
from a consumer perspective the problem is usually what do you prefer, buying counterfeit
food or food with some strange particles and DNA in it?
>>: Could you sequence [inaudible]?
>> Robert Grass: I mean that's what's done, by the way. So the big studies you can, for
example in tuna, you can say from the sequencing of the DNA of the fish where it came from.
There are similar studies with wine and others. So that's also where part of the idea comes
from that you take DNA, all of the analytical means exist, it's just that if you don't want to say
okay, my tuna is from South Pacific I want to say that tuna is made by me and not by him so
that you can introduce new kinds of information which wasn't there.
Another topic we've been working with is nano-toxicology where the nano-particles go, for
example in a [inaudible] plant what happens with particles. We can use these barcode
detracers to follow the flow of water and to follow particles. We've done studies following
pesticides. So you add your particles to the pesticide until you can trace where your pesticide
goes, in the field, if it goes the neighbor or how far it travels because you’re so sensitive in
detecting it.
I’ll get to the real thing you’re interested in in a moment to the data structure. Just because I
like this a lot we can go to detecting individual particles containing DNA by an analytical process
called digital PCR. In digital PCR you’re no longer adding DNA in an unknown concentration to a
well amplifying it and detecting on which cycle in amplification you are able to detect it, but
you are diluting your DNA so far that in a well you either have one DNA molecule or no DNA
molecule. And you do that and in many wells. In our case let's say we have a hot plate and
every in well we dilute our particles so low that in every well we either have a particle or there's
no particle. Then you dissolve the particles, you do PCR, and those wells which originally had a
particle will give a signal, those which had no particles give no signal, and so you can really go
and count individual particles one of my one where in PCR you get a logarithmic scale of your
signal. Here, more or less depending on your concentration range, you can get the linear scale
in concentration and also extremely low numbers. So we've done different sizes of particles,
130 nanometers to 70 nanometer particles and can count them in a concentration of one or
two or three particles per microliter. And if you go to the field of nano-toxicology and or being
able to count something then that really makes it really easy. It's like counting sheep but now
you’re counting individual particles.
>>: So [inaudible] in the illustration of the [inaudible] you had positive, negative and blank.
>> Robert Grass: Yes. Blank we just do as a controller. So we just have a blank which runs
straight over the plate as a negative control. The advantage of this is PCR otherwise you always
need calibration. This doesn't need any calibration. You just need a blank to be sure that
where there's nothing you really don't get any signal and there's no false positive results.
We'll play more games on this. I'll accelerate a little. With this we can detect particles but we
think we can do more with particles because if we can decide how DNA decays in a particle we
can use that information to extract some other information from it. Let's say we know
something about the error mechanism on how DNA decays inside of a particle and we
extrapolate some information. Let's say we take particles and we put in DNA and RNA. By the
way, we do that in several layers. And RNA is much more sensitive to temperature than DNA.
So we expose that data to various temperatures and by the difference in concentration of the
DNA and RNA we can say okay those particles have been at a certain temperature over, it
doesn't give a real temperature. It gives an integrative terminal damage that those have done.
We can do the same for light detection. So you put DNA which is caged. Caged means there's
some protection group which is sensitive to light on the DNA and then if you expose it to light
those cages break and you get the free DNA and we can detect your irradiation time of light as
a fraction of caged DNA. And because our sensor is so small we can go into cells. So we take a
Paramecium, a single cell which we feed with some particles that are taken up by the cells, we
eliminate the cells or we don't eliminate the cells with light and we can then detect the sunlight
irradiation all cells have been exposed to for some time.
So you can see with this idea of using DNA and the information content and also the possibility
of really reading at these really, really, really low concentrations you can make very powerful
things which otherwise are very difficult to imagine to do a measurement off-line inside of a cell
and then afterwards extracting the data that’s generated.
The most exotic project we are doing is a human DNA preservation and there's a need to
preserve DNA. In a project we generated jewelry in which you can put in your own DNA
sample. So you have a hole which you close with a diamond and under the diamond we put
some fossilized human DNA which is maintained there for a long time. We think it's a pretty
exotic product. We crowd-funded it so we found 20,000 dollars to pay for those things. So we
have some customers are really interested in that and we think we can continue to build on
that in some that say very specific and exotic applications.
This brings me closer to what I really want to talk about, really storage. So here we are just
storing human DNA. My DNA I can store in these particles much, much longer than other
means. The only other mean is you put it into liquid nitrogen and then you'll have a similar
storage. But with this encapsulation we can write around and we still have the DNA. So this is
taken from I think the Church paper.
>>: What are the biggest fragments you've stored?
>> Robert Grass: I have to admit we don’t really look at it because we take the DNA, we
encapsulate it, and in old analysis we don't really go for the overall length. We do gel electro
freezes and we see it's still large but I can't tell you exactly what the length is of the DNA.
>>: [inaudible] like up to KB or something.
>> Robert Grass: Yes, KB. If we do it with particles the DNA will go from one particle to the
next and they'll be drawn together and if we go without the particles you have this large
precipitate which form, we haven't looked at that in great detail. So the promise of DNA, and I
think that's where we are all really interested is in digital storage medium; and the idea has
been around, they’re not related but some of this [indiscernible] work is pretty old. There’s this
wonderful paper on the [indiscernible]I don't know, really nice paper, the idea of taking DNA to
store information. At the beginning really trivial information in very small amounts, then going
and implementing that information into living systems, and now more recently by Church and
Goldman being able to let's say really synthesize DNA in large amounts and by that storing
information at the very, very, very high density and for long durability.
So nearly all of the work before it’s proceedings that and we saw those papers and say hey, we
think we have the best way of preserving DNA. Why don't we combine it with this and show
that we have the best storage device or most long-term durable storage device imaginable
[indiscernible] because if you look at those papers, that’s from the Goldman paper, in those
works DNA was synthesized, transported for two days until they were analyzed so no long-term
in any way. It was make it today, we read it tomorrow and we don't look at that time frame in
between at all. As I told you, DNA does decay especially it depends a lot on how you store it.
So we were interested in really looking at this in between. How long we really store it and still
recover the information we put down and how’s the connection between those two worlds?
This is from this Lindahl ’72, Stability of DNA in [indiscernible] solution. It depends on pH.
That's half-life of about 100 base pair sequence so you can scale it with the length linearly. So
he did some data at high temperatures and extrapolated the kinetics of that decay process.
And you see here if you go 20 degrees you get a few years of stability in the water in a human
body. And if you have metals in solution obviously it gets much worse than that.
We had a look at what happens if we put the DNA into our particles. What happens we can't
wait for a very long time to measure that data so we also do accelerated aging. Accelerated
aging is nothing else instead of measuring at 20 degrees we'll take hundreds of years ago to 60
degrees where decay will be much faster and we extrapolate the data from high temperatures
down to low temperatures by Arrhenius equation and we get an idea of the stability. So in
silica those are our experimental points. As you see there's quite some distance unwritten kind
of extrapolates to lower temperatures. Yes?
>>: Where is the model come from for extrapolating so far out?
>>: Robert Grass: So far there’s a big challenge. It would be my next sentence. Arrhenius is in
all of chemistry accepted that if the chemical reaction of decay stays the same there’s an
activation energy on the pre-exponential coefficient of the reaction and with these three data
points we can measure those two factors and then we can use that to extrapolate.
>>: So basically you're tying back into decades of work on it.
>> Robert Grass: Yes, exactly. Of course extrapolating in such is completely extremely
dangerous especially if you start with three data points. There are other people who also have
similar data on dry DNA. There's a big paper from Bancroft on DNA in vacuum, DNA stability on
a vacuum. They have very large data sets that get the same activation energies in the absence
of water. And for me the most important point is this one here. This is DNA in bone. That’s the
paper of the 521 years of our DNA over different lengths. You see we are more or less spot on
on stability within the particles extrapolated to that what you get in natural bone. In this
extrapolation you assume that your reactions which happen at high temperature also happen
at low temperature. There's no other reaction taking over. That's the only real assumption
that’s in there. All of our pharmaceutical industry, packaging industry, completely rely on that.
If you do expire date at pharmaceuticals nobody waits for five years to measure expiry. You
heat it up to 60 degrees, do a series of tests, and that's how expiry dates are established. So it's
pretty well established. We have let's say data for the rates. I think we have a good set that
are very close to what we would expect for DNA without water for 500 years. So that point is
500 years. I'm unlikely to wait so long for my experiment.
>>: Maybe not 500 years, but are you doing something [inaudible]?
>> Robert Grass: Yes, exactly. We're looking at more long-term. So I just had a PhD student
who started a very easy project. You do something at the beginning, you wait for three years
and then you analyze the data.
>>: [inaudible] another project.
>> Robert Grass: He has other projects as well. It's very important point in this data is effective
humidity. You see dry DNA in the no humidity also decays slowly. That's at I think the 65
degrees over four weeks. That's how long we tested. Even if when the humidity up it decays
very quickly. They are commercial products. One of the Whatman cards that have black spots
[inaudible] are stored that way for forensic sampling. They work very well if it's dry but if the
atmospheric gets humid they fail. And in silica it doesn't care what the humidity is because the
DNA is protected from the outside atmosphere.
>>: It's not as stable as it is.
>> Robert Grass: It’s not quite as stable, that’s correct because it does have water in there.
That's our expectation with the charges we have on the surface. We are not perfectly matching
the charge of the DNA and some water will be encapsulated with that as well. And the project
we are doing at the moment we already have data showing that we make the surface more
hydrophobic of the particles or the chemistry we will get less water and stability goes up
further.
>>: You could probably add stuff too.
>> Robert Grass: We could also add stuff. For example, this DNA stable is a commercial project
which is sugar, [indiscernible] which is known from spores, spores when they dry out they
produce a lot of that sugar and DNA stable in the presence of the sugar as long as it's dried.
Sugar likes water, is soluble in water, so if you had some humidity it really draws in the water
into the sample and so there are things that can be done there. The question is how much
further do we want to go and what are really the challenges at hand? So if you go to data
storage, we’ll discuss that at the end, I think the other challenges, let’s say shorter-term
challenges, would have to be overcome to get DNA data storage. For the applications we are
looking at in forensics or in testing every one of those applications has a separate let's say cost,
stability issues which we have to look at, and those things we are looking at at the moment.
Yes?
>>: [inaudible] go up in some cases?
>> Robert Grass: Yes. It's all by PCR. So the reliability of, also the scale at which we are
measuring is enormous.
>>: [inaudible]?
>> Robert Grass: Difficult. Our data is that consolidated.
>>: How easy is it to, the nice thing about these commercial products is that it's extremely easy
to use and you just dump your DNA [inaudible]. Your [inaudible] don’t look that complicated.
>> Robert Grass: No. They're similar in the simplicity I would say. You just add the chemical
and you wait until the reaction finishes. We also said if you go to publish that and do all of the
material analysis we purify them afterwards, wash off the non-bound DNA, but if you would go
really for simplicity I don't think you'll have to do that. So you just go, you add [indiscernible],
we have a paper on increasing stability of RNA because RNA decays even faster than DNA. It's
really impossible to store it. So we take RNA samples, we add the chemicals, that's all we do.
We just leave it staying around and then if you want to read it we again add chemicals and we
read it out.
For example, the Whatman card has other advantages, just punch it, you don't even need liquid
handling. There are advantages in that as well. Let’s say if we really go for the application of
the bio banking I don't know if you're competitive because that's already established and it’s
difficult to come with new technology which has an advantage in humidity so I exclude the
drying agent in the end of the storage system. That's really difficult to get into. But you see
conceptually for the tracing for sure you have to have something within the material you have
for data storage. You can debate what the advantage is in the end of having it in particles
instead of on the Whatman card.
One advantage I see of the silica it could be done in planar. So if you have your DNA
synthesized on a chip, for example, you could just encapsulate the whole chip and store that
because I think one interesting thing that comes at least to my mind if you go to DNA data
storage with synthesizing DNA on a chip, we are reading it on the chip, so we might as well keep
it on the chip as long as we can. The problem is just the technologies that have been developed
are not on the same chip, different manufacturers, different platforms and so on. But if you
really go conceptually you would say I keep it on the chip, I keep the geometric information
index or whatever then I don't have to index it, I don't have to random let's say the statistics of
pulling samples because I now have this if I take it off the chip and then pull samples to read the
information. So the advantage of the silica could be done on the planar surface as well.
>>: Can you do that with the DNA stable as well? Isn't that [inaudible]?
>> Robert Grass: Yes, you could [indiscernible. I think the question is how thick it has to be to
get there. For me from a chemist’s point of view it's more showing difference between
inorganic material, glass and plastic, so that’s plastic, ours is really solid glass. They have
different properties in the end in terms of resilience to external chemistry. But yes, then you
can debate. Yeah?
>>: IN the second stage of the coding, the coding that's just the glass, can it be done in an
organic solution?
>> Robert Grass: I think it has been done in an organic solution or very similar things have been
done in organic solution.
>>: [inaudible] dry?
>> Robert Grass: Exactly. The challenge is that it could be dead dry. The DNA will always find
water because you have this negative charge on the DNA and so you'll always have some
surrounding of water and getting rid of that will be extremely difficult. And DNA doesn't
dissolve in any organic solvents at all.
>>: At least get is as dry as dry DNA.
>> Robert Grass: Yes, you could get it as dry. I think can get up there. That's certainly possible.
For sure that's possible to improve it to that.
>>: What do you envision the encapsulation process to be if you keep it on the wafer? Is it spun
on?
>> Robert Grass: It would be spun on or dipped in a solution which the silica grows on the
surface. So the silica in this case grows direct to the surface. You would have it grow and then
you would pull it out and it's stable. You could also spin on the solution in a thin layer and get
the chemistry. As far as we got it at the moment it's still relatively slow. Our chemistry takes
four days and that's not very practical especially if you have such a system like that where you
want to finish it and put it into storage in a dry way. It can be tuned, it can be catalyzed, this
silica surface, this silica polymerization by acid and base. DNA doesn't like acid or base in the
extremes but you have to go to some extent and play with how much acid or base catalysis can
we allow to maintain the stability of the DNA and have the reaction faster. That's again
[indiscernible].
>>: So back to your comments on the release. You pointed out that the chemical etching
process is similar to silica manufacturing. The modern planar silica process uses CMP, Chemical
Mechanical Polish?
>> Robert Grass: Yes.
>>: Have you considered using a CMP-like process for a release that uses more of a neutral type
of acid or base?
>> Robert Grass: So, no I don't think that's possible because in Chemical Mechanical Polishing,
as far as I understand CMP, it's a really complicated world because you have chemical
interactions between your grinding [indiscernible] for example and the silica which helps let's
say the [indiscernible] and that's not possible in this way, because the [indiscernible] would be
in a similar scale than the silica particles are. If you do it all in a planar you could think about
that. You could just let's say polish off the protecting layer. The challenge there is certainly in
our case the DNA is really embedded in the silica. So you have these very nice papers of other
people working on DNA and silica. You can even get helix of the DNA imaged in the silica if you
don't add enough silica. And if you add more and more silica you're adding a silica layer and
the helix disappears. But it's really tightly bound into the structure of the glass. As we said,
there’ll be some water molecules in there, but in the shape and in its overall thing it's really in
there and if you polished a layer off you just take the DNA away with it. So having something
that really just takes away all of the glass more or less at the push of a button like the hydrogen
fluoride does I think is more appealing in that.
The challenge if you go planar your surface may not be silica or not silica oxide. If it’s silica you
might get away with it because the reaction between hydrogen fluoride doesn't really etch
silica because it first has to be oxidized and you need an oxidizing agent to do that, but that
would be a challenge. We use exclusively silica. We also did some study with iron oxide. So we
have magnetic particles, iron oxide with barcode and silica and we use iron oxide because it
also is dissolved by hydrogen fluoride. So you can magnetically move or collect your labels, you
put in the chemical and everything disappears; just the DNA stays in solution. So let's see how
far we can go with this fluoride chemistry. I believe how far you can go. Yes?
>>: So planar [inaudible] couldn’t you just deposit it [inaudible]?
>> Robert Grass: Yes, I suppose so.
>>: Wouldn't it be quicker than the four day?
>> Robert Grass: For the deposition?
>>: Right. For the deposition of>> Robert Grass: I would go with a chemical means to just make it faster, to improve it by
catalysis and because it' really dirt cheap the chemicals are for free. They are 10 dollars per
liter or something like that and you can just grow>>: Could you get it really dry?
>> Robert Grass: I suppose you can get it pretty dry.
>> Robert Grass: Yes. Errors. Because with this we can decay the amount of errors. Errors just
progress slower in our system but we’ll always have errors. Every decay, errors will always
happen from the first moment we make the DNA, somewhere there will be an error. I don't
come from the IT world but I was trained by my colleague in data storage. Errors are not
accepted in all, I expect my computer to work perfectly all the time. There's no error. This
world does not allow for errors. So we have to find a way to get away with the fact that errors
do happen still, and so if we want to go to this long-term storage we need two things. One is a
chemical world which I showed you, and the second world was introduced to me by Reinhard
Heckel.
So a PhD student in a group of Bolcskei ETH, I know him from sports and I told him about my
idea of comparing DNA storage with a hard drive. I said what I'll do we’ll write a file, I’ll store it
on DNA, you’re going to store it on a hard drive and then we’re going to torture the two,
destroy it on the street, put it in a microwave, heat it up, who knows what and he very quickly
said well, that would be extremely unfair because your hard drive or my hard drive in that case
can do error correction and your DNA errors just persist. He then very quickly came with idea
why don't we integrate this error correction into DNA storage? Has it been done? What can
we do there? And so we looked together how error correction can be implemented into DNA
storage and I'll try, it's not my words, I apologize for any errors, so [indiscernible] I do but more
or less between the two of us we had a look how can we really implement it into DNA and what
are the restrictions?
I don't know how familiar you all are with that DNA of writing and reading. We like a custom
company which obviously is near here, Custom Array. They synthesize DNA on the silicon wafer
where they electrically direct chemistry on these small patches where they say okay, if I have
electricity, nucleotide reacted to that position, I wash away nucleotide on another one, and so I
can generate a whole chip of different DNA sequences. We used a 12K chip, so 12,000
sequences, very sequence different in the length of about 150 nucleotides long.
Reading, more complicated to explain, but in the end it’s also on the chip and you're also
reading one layer at a time by optical image acquisition, Illumina in that case. There are
different technologies on the market at the moment. If you talk to a biologist and ask him
what’s the future of sequencing he'll say Illumina, Illumina, Illumina. Prices are coming down
tremendously. I like this slide from NIH. That's from the Human Genome Project. 100 million
dollars for the Human Genome Project for the sequencing. Amazing, how much money was
spent on that. Twice, actually because it’s been done twice. And then here's modest onset of
the next generation sequencing. Illumina sequencing really got in and you have this enormous
price drop of reading DNA. It's still expensive. A whole genome, so we are talking about
somewhere around a gigabyte of information which we're reading for 1000 dollars compared to
a hard drive cost of reading a gigabyte I don't know, essentially free, but the direction I think is
the interesting thing that it's really going extremely rapid and even more rapid than the law
which defines the industry here to some extent.
So the sign constraints we already set. We can't make sequences of endless length because we
have an error which is introduced during the chemistry and if we make them longer and longer
they are more and more error-prone so we go for relatively short sequences. You have to index
them because in the end they're in a pool random and we have to know which sequences
which to put together the information at the end. And there’s the idea you should avoid long
repetitions of the same nucleotide for the reading. As far as I learned in the meantime from
Illumina sequencing that's not such a big problem but we went and tried to avoid long
repetitions of the same nucleotide.
Expected errors. This is more or less by guessing at the beginning when we started the project.
For reading or writing half a percent per nucleotide we're introducing. I assume you have much
better or more accurate numbers at the moment. Reading error we had smaller than that. So
in one sequence it will have half an error on average. That was our guess to which we designed
the error correction then doing storage we would have point errors similar to those but we
would also have lots of complete sequences where some sequences just disappear out of our
pool. So we had to have a way of correcting these let's say individual errors along a sequence
and the errors where we just lose complete sequences and we have to cope with that.
Reinhard said the way to do that is by Reed Solomon Coding. I don't know if that's familiar. I
won't explain it because I hope you will learn much better than I do. I learned it comes from
communication originally, satellite. It's a standard of DVD storing technology. I imagine that
it’s powerful and is the same thing we need for what we do as in the DVD. If you scratch it you
just lose all of the information under the scratch. It’s just completely lost just as if we lose a
whole sequence. But at the same time you have point defects, by reading or writing on the
DVD you have individual bits which are not correct, and so we have to be able to cope with the
two and we have to do the same. We have to cope with individual defects and we have to cope
with let's say large whole sequences or whole blocks of information which are lost. So I also
won’t explain that.
So we had to find out how we translate from binary to DNA by not allowing sequences of let's
say [indiscernible] polymers of let's say only A’s if we have a sequence tons of zeros we want to
write that there are only A’s in the DNA we make. And we decided if we map all the
possibilities of three nucleotides in series so we have AAA, AAC, ACA, so these are all
possibilities. We don't allow possibilities where the two last characters are the same. If we
have a look at that we get to 48 possibilities when we have these three nucleotides. We have
48 possibilities in this let's say these triplets of nucleotides.
In Reed Solomon Coding I learned you need a finite field. It has to work by a prime number.
The closest prime number is 48 is 47. So we took our let's say triplets and assigned every one of
them a number between 1 and 47, 0 and 46. That's what we have here. So we have TTA is 11,
TTG is 35, and so all of these three possibilities are assigned to a number. This, by the way, is
expressed the same way as in biology proteins are expressed. So if you look at how proteins
GCA will begin [indiscernible] 9 is CGG [indiscernible] so that's how the translation of DNA to
protein happens in biology. And if biology says DNA is a redundant system what he means is
that the last, usually the last base of a pair of three can be random, from an information
[indiscernible] perspective a very poor redundancy because if one of the others are wrong it's
wrong, but in biology that’s believed to be redundancy.
So we have this translation of let's say digital information to these numbers in this finite field of
47 and then we have to arrange that information in some kind of blocks and implement the
redundancy. So what we did is we took original information, a certain block size, Reinhard
defined, and we'll see in a moment why it’s that size. We have original information. We
generate new information, redundancy A by extension fields of this as far as I know of that
finite field which is really new information we add. We add an index to all of the sequences.
The sequences will go in this direction.
So this information is in essentially new sequences which we generate in the redundancy. And
then every sequence gets redundancy at the end. So that will be a sequence of DNA and so I
have sequences which contains an index, original information, and some redundancy to correct
errors which happen in here; and if I lose a complete sequence I have these redundant
sequences at the end which I can use to account for losses of complete sequences.
So I have this finite field, so these sequences I translate them according to my circle I just
showed to lengths of 170 nucleotides, I make adapters so I can implement with DNA
sequencing technology and this is photosynthesized by Custom Array. So I sent him a file with
all of those sequences and I got back 80 microliters of water>>: Two thousand dollars.
>> Robert Grass: Two thousand dollars. Exactly. We have a scheme where we have two stages
of introducing the whole adapter. So that's really the details on how you get this then onto the
sequencing platform because for us it was important that we can index how it runs. So on one
sequencing experiment we did different temperatures. So we could index Illumina for indexing
by introducing the primers into the adapter.
So if you go to reading and decoding so we have to the solution of DNA. We more or less pull
individual strands randomly. We read them Illumina, somebody else reads them first also for a
few thousand dollars, we then get the information, we first decode the inner code, so we write
down the information, we get correctly sequences; those which have more than the three
errors which we allowed in a sequence are thrown away. We sort them by the indexes and we
get this information, the redundancy A, on say some gaps in the information and we then use
this redundancy on the side to get back to our perfect information. At least that was the idea
of this let's say two Reed Solomon Codes in between each other, one for errors within
sequences, and one for completely lost sequences. And you can let's say tune the two
depending on the error level you would expect for a certain experiment.
>>: [inaudible] you should be getting sometimes multiple copies of the same thing. Do you use
this or just>> Robert Grass: We didn't use that. We even get forward and backward reads. We had a
look. It didn't make a big difference because what we get here at the end is so much more
powerful than just averaging at the beginning that we only use the original information without
any gains with that and [indiscernible] to the decoder.
>>: So essentially you dip like one of each of the strands you synthesize?
>> Robert Grass: I have to ask exactly how he, I’m sorry that I don't, I don’t think he averaged;
he first decoded. So he certainly first decoded the inner code and I'm not sure on which point
he averaged and then he went into the outer decoder. But we can ask him very easily.
Just a comparison of what has been done before our work. So the first paper by Church,
actually according to Goldman his was first, at least Goldman’s paper submitted first, Church’s
paper was published long before the Goldman paper. They just said we take an entropy of
maximum entropy is two because we have four oppositions. Data set we have zero going for
AOT and one going for COG. And the entropy is not quite one because we have primus we have
to introduce into that. It’s non-tunable and redundancy is just by copy of sequences.
Experimentally they didn't get complete recovery. They had to tweak a little here and there. I
don't know exactly the numbers. I think 10 or 15 errors they had in a megabyte of a file. It's
unfair to compare them in that because our file size is much smaller.
Goldman went by redundancy overlap of sequences. Both of them actually are vulnerable to
bad sequences because either one sequence by chance it binds to itself or it's not readable you
have in all of your information. The entropy is really low. Entropy in the end is cost of
synthesis. They have to synthesize at least twice of the DNA that Church has to synthesize and
experimental they also had a few inconsistencies on some of these bad sequences which they
then had to manually correct. In our case we [indiscernible]. We say it's not vulnerable to bad
sequences because if you have bad sequences it’s just corrected by the outer redundancy.
Entropy is still relatively far from two because we allow quite a lot of errors. So we allow I think
20 percent of external redundancy and three errors per sequence. So that's relatively
expensive and then we still have primers in our system and indexes which all go against the
entropy of the final information you have so it's still pretty far from one. It's better than the
previous cases. In our case we get error-free redundancy for smaller file size but on the
underside after extreme heat treatment.
So we have experimental data for the original information we synthesized. That information
heat treated at 65 for a week and for two weeks. That's equivalent to about two and four half
times of decay as you can still get the information back completely perfectly.
>>: But the entropy here [inaudible] only tells part of the story, right? Because not only you
have high entropy, but look what else you need probably need fewer copies>> Robert Grass: Yes.
>>: [inaudible].
>> Robert Grass: Yes. That’s completely true. The challenge for that I would put this kind of
entropy is directly cost of synthesis. Copying in the end is free. If you go to space>>: You need enough of one of those things that after you do those things [inaudible], right?
>> Robert Grass: So what’s the density at which you>>: [inaudible] provision copies [inaudible].
>> Robert Grass: Exactly. And one advantage we also have in reading we don't have to read
every last sequence because we don't need the last, I mean in the other cases you have to read
until you statistically pull out the very last sequence. In our case you don't have to go quite as
far which is statistically also an advantage.
You have to decide for some data if you do some work like that which you want to store. Swiss
Federal Carta, none of us are Swiss, but we all work in Switzerland. Like Magna Carta in UK it's
better known. It’s from the foundation of the Swiss state, important document, and I love this
book, Archimedes Palimpsest. I'll just tell you a little about it. It was written by Archimedes of
Syracuse and it contains some of this groundbreaking work on let's say the first ideas of
integration. It’s the single only copying of that book. It was copied in the 10th century and
overwritten in the 13th century. So it had all of this mathematical stuff completely useless, we
need some more songs or other [indiscernible] text.
So the original text is in this direction and it was overwritten in this direction. The book was
then found in the beginning of the 20th century in 1906, discovered or at least scholars
studying Archimedes said the text is from Archimedes. This appeared interestingly enough
again during the wars both by [indiscernible] for 2 million dollars a few years ago. You can look
at in Washington I think in the museum. And for me it tells two stories. One is how important
information is and how difficult information travels through time even if it would be stable on
the format, the paper. Papyrus has probably certainly more than 1000 years of half-life. Still it
can go very non-obvious ways which have no technical let’s say reason for going that way so
that's why I like this book. Also because of course it's a scientific mathematical work at the
beginning. So for us that was 80 kilobytes these two text files which go to 5000 sequences in
the entropy level we have.
So experiment I think is non-trivial. We translate first text to binary, then binary to DNA,
synthesize to DNA and encapsulate it, we stress it for a few weeks in the oven, we release the
DNA, we sequence this and decode it and if everything goes right we have the same level the
same information back.
Error levels. So this is an initial sequence error that's without averaging, by the way. Just if you
take the raw reads what's the probability of an error in the read? If it's pretty high 50 percent
the errors of reads you would average, you get it down. We didn't do that. Then after inner
decoding, so after getting rid of the errors we have within a sequence we get down to a few
percent of error. Then the outer code does errors and erasures and in the end we get to
perfect recovery of the file of 100 kilobytes and we get exactly the same just at higher error
levels. You see if we heat treat it that’s about not quite a week at 60 degrees, 65. You get high
initial rates, you get much more losses of complete sequences, and then if you go to longer
times and four half-lives of decay things get worse, but at the end we have to say we are really
lucky that we experimentally found a window where we can still read the data.
So let's say after four half times we can still read the information, four half times at room
temperature, a few hundred years times four is let's say the storage time we can get. You see
here if we get code things get much better. Minus 18 you get a few million years. So we did a
few short experiments. I like this global seed vault in Spitsbergen. It's freezing there in
permafrost and our society stores all the seeds of the world. For me what's so interesting is
that we have a place where we store all seeds of the world but not information. It's really
strange that we go for let's say the natural diversity you want to store the let's say our
intellectual diversity I don't know of anything, there are libraries which also go for let's say
maintaining stability for a certain amount of time and fire protection and things like that but
not to the extreme that we do in seeds.
By the way, it’s even cooled so it’s not minus 18 there. It’s about just below freezing and they
artificially cool it to get to minus 18. Coming from Switzerland we would think of Jungfraujoch.
You can travel there all year long and it's colder than zero degrees. You will get 100,000 years
of storage data and you could think of let's say this library would be our vision because in
Switzerland we have many old bunkers from the second world war which would be nice and
cold. [indiscernible] Spitsbergen would store the samples in there somewhere. We only know
that it’s stable. Exactly how you would do the storage and let's say we read it at a practicability.
So I think what you are really looking at we haven’t looked any further into that.
So I think it's still a dream of going there. I certainly believe that the two promises in space and
in stability are there, they are real; it’s just a question of how do we get there to get it into
reality? For me, important things I still see that are open is to better understand the
degradation. We are still extremely crude there. Also, people working with fossils extremely
crude. They have little understanding on how the DNA has decayed in the past. Also, what
physical and chemical triggers add to that. The new sequencing technologies coming up which
are promised to be much smaller, cheaper, we have to know what the limits from them for that
technology; and ideally if we do our coding and we can account for those two things we can be
much more efficient in the amount of space and time we need for that. And I think the most
important factor in all of this is cost of synthesis.
Sequencing is going down tremendously. The new device is coming out. There's a really strong
competition in that. As far as I know companies that can do DNA synthesis useful for this, four
perhaps, I don't know if you know more. Competition is not very strong. The market for that
from the biological side is not very strong. So if we want to really push into DNA data storage
the cost of DNA synthesis is the real push we have to do. And the challenge there these
technologies exist for biology but they are made to be perfect. So they are designed to make
perfect DNA strands because they want to make genes out of it and if you make a gene if just
one base is wrong you're lost. It doesn't encode properly in biology. But if you go for data
storage, I mean our hard disk, you don't care. The manufacturers of hard disks they do care but
they have a good idea on how many errors happen and so they get rid of the errors by coding
and putting in the right amount of redundancy and that's where we can play if we can get let's
say a more dirty way of DNA synthesis, that’s I think what’s needed and more cost efficient.
Yes, I'm done.
So I hope I showed you some chemistry on let’s say encapsulating DNA in glass and that it’s very
stable. It can be made even more stable if the continued work is done. If we combine that with
error correction we get to really long-term stable data storage, at least we can extrapolate to
that, and in what I showed you before we can also play with the errors to measure other things,
not just to store information. With that I would like to acknowledge that's part of the group. I
work in a group of Wendelin Stark. We work on many other things than magnetic particles and
DNA storage. That's why there are so many people on the photograph. I would like to thank
you very, very much for your attention, and I'd be happy to discuss these things in more detail if
time is still here.
>>: We're out of time so [inaudible] offline.
>> Robert Grass: I’m sorry for that.
Download