Smart Data and Wicked Problems Paul L. Borrill Founder, REPLICUS Software Abstract Doug Lenat says we are plagued by the problem of our data not being smart enough. In this presentation, we first explore why we want smarter data, and what it means. We look behind the scenes at the frustrations that knowledge warrior’s experience in their digital lives. Problems that are easy to fall victim to, such as information overload, the constant unfolding of additional tasks that get in the way of getting real work done (Shaving the Yak), and the seemingly endless toll on our time & attention in order to manage our digital lives. We illuminate these problems with insights gained from design considerations for a 100PB distributed repository and peel the onion on these problems to find that they go much, much deeper than we imagined: connecting to “wicked problems” in mathematics, physics, and philosophy: what is persistence? Why is time and space not real? Why is the notion of causality is so profoundly puzzling? And the impossibility of solving certain problems with a God’s Eye View. Finally, we propose a prime directive comprising three laws, and six principles for design, so that if our data becomes smart, that it does so in ways that truly serve us: simple, secure, resilient, accessible, and quietly obedient. 1. Introduction. 1.1 Why make data smart? “The ultimate goal of machine production from which, it is true, we are as yet far removed – is a system in which everything uninteresting is done by machines and human beings are reserved for the work involving variety and initiative” “What information consumes is rather obvious: it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention, and a need to allocate that attention efficiently among the overabundance of information sources that might consume it” ~ Bertrand Russell ~ Herbert Simon As our commercial operations, intellectual assets, professional and personal context progressively migrate to the digital realm, the need to simply, reliably and securely manage our data becomes paramount. However, managing data in enterprises, businesses, communities and even in our homes, has become intolerably complex. This complexity has the potential to become the single most pervasive destroyer of productivity in our postindustrialized society. Addressing the mechanisms driving this trend and developing systems & infrastructures that solve these issues, creates an unprecedented opportunity for scientists, engineers, investors and entrepreneurs to make a difference. Human attention is, at least to our species, the ultimate scarce resource. Attention represents both a quantitative as well as a qualitative aspect of life, which is all we have. The moments we are robbed of by the voracious appetites of our systems to demand our tending, the less life we have available to live, for whatever form our particular pursuit of happiness may take: serving others, designing products, creating works of art, scientific discovery, intellectual achievement, saving the earth, building valuable enterprises, or simply making a living. In enterprises, massive budgets are consumed by the people hired to grapple with this complexity, yet the battle is still being lost. In small & medium businesses, it is so difficult to hire the personnel with the necessary expertise to manage these chores, that many functions essential to the continuation of the business, such as disaster recovery, simply go unimplemented. Most consumers don’t even back up their data, and even for those who should know better, the answer to “why not” is that it’s just too cumbersome, difficult and error prone. Why we want to make data smart is clear: so that our data can, as far as possible, allow us to find and freely use it without us having to constantly tend to its needs: our systems should quietly manage themselves and become our slaves, instead of us becoming slaves to them. What this problem needs is a cure; not more fractured and fragmented products or an endless overlay of paliatives that mask the baggage of the storage industry’s failed architectural theories, which in turn rob human beings of their time and attention to manage the current mess of fragility and incompatibility called data storage systems. Paul Borrill. “Smart Data and Wicked Problems” – TTI Vanguard. V1.2 12-Feb-2008 Page 1 of 20 1.2 Three Laws of Smart Data “Men have become tools of their tools” ~ Henry David Thoreau Now that we recognize we are living inside an attention economy, we might ask what other resources we can bring to bear on this problem. It doesn’t take much to realize that there are rich technological resources at our disposal that are rather more abundant: CPU cycles, Memory, Network Bandwidth and Storage Capacity. We propose the following laws for Smart Data: 1. 2. 3. Smart Data shall not consume the attention of a human being, or through inaction, allow a human being’s attention to be consumed, without that human being’s freely given concurrence that the cause is just and fair. Smart Data shall obey and faithfully execute all requests of a human being, except where such requests would conflict with the first law. Smart Data shall protect its own existence as long as such protection does not conflict with the first or second law. 1.3 Wicked problems Rittel and Webber1 suggest the following criteria for recognizing a wicked problem: There is no definitive formulation of a wicked problem. Wicked problems have no stopping rule. Solutions to wicked problems are not true-or-false, but good-or-bad. There is no immediate and no ultimate test of a solution to a wicked problem. Every solution to a wicked problem is a "oneshot operation"; because there is no opportunity to learn by trial-and-error, every attempt counts significantly. Wicked problems do not have an enumerable (or exhaustively describable) set of solutions. Every wicked problem is essentially unique. Every wicked problem can be considered to be a symptom of another problem. Discrepancies in representing a wicked problem can be explained in numerous ways. The choice of explanation determines the nature of the problem's resolution. The designer has no right to be wrong. Note that a wicked problem2 is not the same as an intractable problem. Related concepts include: Yak shaving: any seemingly pointless activity which is actually necessary to solve a problem which solves a problem which, several levels of recursion later, solves the real problem you're working on. MIT’s AI lab named the concept, which is popularized by Seth Godin and others. Gordian Knots: Some problems only appear wicked until someone solves them. The Gordian Knot is a legend associated with Alexander the Great, used frequently as a metaphor for an intractable problem, solved by a bold action ("cutting the Gordian knot"). Wicked problems can be divergent or convergent, depending upon whether they get worse as we recursively explore the next level of problem to be solved, or it gets better. 1.4 Knowledge Warriors If we apply our intelligence and creativity, we can conserve scarce resources by leveraging more abundant resources. Many of us devise personal strategies to counter this trend toward of incessant Yak shaving to keep our data systems clean and to conserve our productivity. This is the zone of the Knowledge Warrior. We begin each section with the daily activities and reasonable expectations of knowledge warriors as they interact with their data, and go on to explore the connection to the deep issues related to design of smart data. While we hope to extract useful principles for designers of smart systems to follow, we cannot hope in such a small space to provide sufficient evidence or proofs for these assertions. Therefore, connections to key references in the literature are sprinkled throughout the document and those reading it on their computers are encouraged to explore the hyperlinks. I make no apology for a sometimes-controversial tone, the breadth of different disciplines brought into play, the cognitive dislocations between each section, or the variability in depth and quality of the references. It is my belief that the necessary insights for making progress on this problem of data management complexity cannot be obtained by looking through the lens of a single discipline; and that the technology already exists for us to do radically better than do the systems currently available on the market today. Section 5 contains this paper’s central contribution. 1 Rittel, H. J., and M. M. Webber (1984). “Planning problems are wicked problems”. 2 When I began writing this paper and chose the concept of “wicked problems”, I thought I was being original. Google dissolved my hubris when I discovered that Rittel & Webber had defined a similar concept in 1984 (the year of Big Brother) in the context of social planning. They described that, in solving a wicked problem, the solution of one aspect may reveal another more complex problem. Paul Borrill. “Smart Data and Wicked Problems” – TTI Vanguard. V1.2 12-Feb-2008 Page 2 of 20 2. 100PB Distributed Repository 2.1 True Costs The arithmetic for a 100PB distributed repository is rather straightforward: 12 disks per vertical sled3, 8 sleds per panel, 6 panels per 19” rack yields >500 disks per rack. In current 2008 capacities this yields >0.5PB per rack. So five datacenters containing 40 racks each are required for ~100PB raw capacity. The quantitative picture above may be accurate in disk drive costs, but anyone with experience in the procurement and operational management of digital storage will recognize it as a fantasy. Alternatively, mobile data centers built from 20-foot shipping containers4 (8 racks/container), yields ~5PB per container, or 10PB in 40-foot containers. Thus: 10 x 40-foot or 20 x 20-foot containers are required for a 2008 100PB deployment. It is not difficult to imagine a government scale project contemplating a 100 container deployment yielding 1EB, even in 2008. Half this many containers will be needed in 2012, and a quarter (25 x 40-foot containers = 1EB) in 2016, just 8 years from now. Table 1 : Anticipated Disk Drive Capacities RPM 2008 2012 2016 Capacity 7200 1TB 4TB 8TB Performance 10K 400GB 800GB 1.6GB High-perf. 15K 300GB 600GB 1.2TB What may surprise us, is when we consider costs of the disk drives alone in this arithmetic exercise: Normalizing for a mix of performance and capacity 3-1/2” drives, and assuming an average of $200 /disk – constant over time – yields, for a 100PB deployment, approximately $26M in 2008, $13M in 2012, and $6.5M in 2016. Table 2 below projects costs for Government (100PB) Enterprise (10PB), Small & Medium Businesses (SMB) (100TB), and Personal/Home (10TB) from 2008 to 2016. Table 2 : Anticipated Disk Drive Cost Capacity 2008 2012 2016 Lg Gov 1EB $260M $130M $65M Sm Gov 100PB $26M 13M $6.5M Lg Ent. 10 PB $2.6M $1.3M $650K Sm Ent. 1 PB $260K $130K $65K SMB 100TB $26K $13K $6.5K Personal 10 TB $2K $1K $500 Given the history of data growth and the voracious appetite of governments, industry and consumers for data storage, it is reasonable to assume that scenarios such as the above are not just possible, but inevitable in the years to come. But this is not an accurate picture for stored data. While disk procurement costs are in the 20c/TB range, the costs of fully configured, fully protected and disaster recoverable data can be a staggering two or more orders of magnitude higher than this. For example one unnamed but very large Internet company considers their class 1 (RAID 10, fully protected) storage costs to be in the range $35$45/GB per year. In such a scenario, if the disk drive manufacturers gave their disks away for free (disk costs = $0), it would make hardly a dent on the total cost of managing storage. Some of this cost comes understandably from the packaging (racks), power supplies and computers associated with the disks to manage access the data: a simple example of which would be Network Attached Storage (NAS) controllers which range from one per disk, to one per 48 disks. Another factor is in the redundancy overhead of parity and mirroring. At the volume level, RAID represents a 30% to 60% overhead on the usable capacity. This is doubled for systems with a single remote replication site. Disk space for D2D backups of the primary data consumes 2-30 times the size of the RAID set (daily/weekly backups done by block rather than by file5), and with volume utilizations as low as 25% on average, we must multiply the size of the RAID set by a factor 4 to get to the true ratio of single-instance data to raw installed capacity. All of this can be calculated and tradeoffs in reliability, performance and power dissipation can be an art form. However, even with a worst-case scenario, the cost of all the hardware (and software to manage it) still leaves us a factor of five or more away from the actual total cost of storage. If all hardware and software vendors were to give their products away for free, it might reduce the CIO’s data storage budget by about 20%; and as bad as this ratio is, it continues to get worse, year after year, with no end in sight6. In order to satisfy Wall street’s obsession with monotonically increasing quarterly returns, digital storage vendors are forced to ignore (and try to hide) the externalities their systems create: primarily, the cost of human capital, in the form of administrative effort to install and manage data storage. This is not even counting the wasted attention costs for knowledge warriors using those systems. 5 3 High-density 3-1/2” drive packaging + NAS controllers in a vertical sled arrangement (Verari): 576-720 disks/rack. 4 Project Black Box (Sun). Zachary Kurmas and Ann L. Chervenak. “Evaluating Backup Algorithms”. IEEE Mass Storage 2000. p 235-242. 6 Economist Magazine: Andreas Kluth “Make it Simple: Information Technology Survey”. October 28th 2004. Paul Borrill. “Smart Data and Wicked Problems” – TTI Vanguard. V1.2 12-Feb-2008 Page 3 of 20 3. Identity & Individuality "Those great principles of sufficient reason and of the identity of indiscernibles change the state of metaphysics. That science becomes real and demonstrative by means of these principles, whereas before it did generally consist in empty words." ~ Gottfried Leibniz How smart data “appears” to us is affected by how easy it is to identify and manipulate it. But what is “it”? , and how do we get a handle on “it” without affecting “that” begins to reveal some wickedness. 3.1 Getting a handle on “it” The first question is to consider how we identify what “it” is, where its boundaries are and what its handles might be. Knowledge warrior’s prefer to identify data by namespace and filename. Administrators prefer to identify data by volume(s), paired with the array(s) and backup sets they manage. Storage system architects prefer not to identify data at all but to aggregate block containers into fungible pools that can be sliced diced and allocated as fungible commodities. Each optimizes the problem to make their life easier. Unfortunately, the knowledge warrior has the least power in this hierarchy, and ends up with the left over problems that the designers and administrators sweep under the rug. When it comes to managing “changes” to the data (discussed in detail in the next section), the situation begins to degenerate: knowledge warriors prefer to conceptualize change as versioning triggered by file closes. This creates several problems: 1. Administrators have difficulty distinguishing changes to a volume by individual users, and have no “event” (other than a periodic schedule) to trigger backups, so whole volumes must be replicated at once. 2. As the ratio between the size of the volume and the size of the changed data grows: increasing quantities of static data are copied needlessly, until the whole thing becomes intolerably inefficient. 3. As data sets grow, so does the time to do the backup – this forces the administrators to favor more efficient streaming backups on a block basis to other disks, or worse still, tapes. 4. Users experience vulnerability windows, where changes are lost between the recovery time objective (RTO) and recovery point objective (RPO) imposed on them by the system administrators. Palliatives are available for each of these problems: diff’s instead of full backups, more frequent replication points, continuous data protection (CDP), de-duplication etc. Each of these imposes complexity that translates into increased time and expertise required by already over-burdened administrators. The traditional method of managing change is to identify a master copy of data, to which all others are derivatives. Complexity creeps in when we consider what must be done when we lose the master, or when multiple users wish to share the data from different places. Trying to solve this problem in a distributed (purely peer to peer) fashion has its share of wickedness. But trying to solve it by extending the concept of a master copy, while seductively easier in the beginning, leads rapidly to problems which are not merely wicked, but truly intractable. Problems such as entangled failure models, bottlenecks and single point failures which lead to overall brittleness of the system. Storage designers find disks to be a familiar and comforting concept: It defines an entity that you can look at and hold in your hand. Identifying the internal structure of the disk, and the operations we can perform is as simple as a linear sequence of blocks, from 0 to N, with “block” sizes of 512 to 4KB. Where N gets larger for each generation of disk technology and the operations are defined by a simple set of rules called SCSI commands. Disk drive designers, storage area network (SAN) engineers and computer interface programmers work hard to make this abstraction reliable. They almost succeed… Give the disk abstraction to applications programmers and we soon see it’s warts: No matter how big disks get, they have a fixed size7, limited performance and they fail, (often unpredictably). These “abrupt” constraints make systems brittle: disks fail miserably and often at the most inopportune time. We get “file system full” messages that stop us in our tracks, massive slowdowns as the number of users accessing a particular disk goes beyond some threshold, or our data becomes lost or corrupted by bitrot8. Fear not: Our trusty storage system designers go one level higher in their bottom up design process and invent “volumes” for us. In principle, volumes are resilient to individual disk failures (RAID), as fast as we would like (striping), and as large as we want (concatenation). We can even make them “growable” by extending their size while they are 7 Similar and sometimes more vexing problems occur with artificial constraints on file size. For example, outlook pst files, which fail when they grow beyond 1GB. 8 Hidden data corruption on disks, often caused by poorly designed RAID controllers. Paul Borrill. “Smart Data and Wicked Problems” – TTI Vanguard. V1.2 12-Feb-2008 Page 4 of 20 on-line (to take advantage of this requires a compatible file systems technology). The invention of abstract volumes for data was a powerful tool to help application programmers in particular, and their users, be able to focus on the problem they wished to solve by providing them with a substrate they could rely on. A perfect example of how an abstraction should work – at least for application programmers and users. Under the hood, Storage Arrays, SAN’s, switches and Volume Managers employ highly complex mechanisms to direct, indirect and interleave commands from servers to disk arrays, in order to provide an illusion of a reliable storage service. Unfortunately, although this abstraction works well for application programmers, it comes at a cost of substantially higher complexity for administrators, partly because of conflicts between the abstractions users prefer to use when identifying their data, with those abstractions designers felt comfortable with. 3.2 Hiding “it” A rather similar problem9 was solved back in the early days of computer systems: the problem known as virtual memory, where each program (and thus each programmer) is given an illusion of a reliable memory, which spans from page 0 to page N (with “page” sizes of 32K or higher). When we add new memory DIMM’s into our computers, we simply turn them on and the additional memory becomes instantly available to the operating system and application programs. This is the way it should be. Unfortunately, storage designers were not as successful in architecting things as were the designers of virtual memory systems. When we add new disks to a computer, or a RAID array, we have a whole bunch of administrative actions to take, and if we get any of them wrong, we end up with an unusable system or worse still, the corruption of data on existing disks. A whole Yak shaving industry has been built up around storage management because of the frailty of this block (and volume) view: “human beings” were inadvertently designed into the system and now have to be specially trained and certified in Yak shaving. There are tools available for administrators to count the hairs on their yaks, and to style special tufts to go behind their ears and on their posteriors. Yak shaving certification courses are available and an entire industry of yak razors and Yak shaving creams have been invented to help them do their job. These administration tasks go by various names: disk administration, LUN masking, storage provisioning, backup, restore, 9 archiving, monitoring, capacity planning, firmware updates, multipathing, performance engineering, availability analysis, installation planning, etc. Why are all these functions necessary for managing disks, but not necessary for managing memory? If you think I am jesting, take a look at a typical storage administrator’s job function by IBM, Sun/VERITAS, or Oracle. 3.3 Users vs Administrators In conventional storage systems, administrators identify data by device and volume, and users identify data by filename. Both the notion of what a data object is, as well as the operations performed, are vastly different between administrators and users. Manipulations or “operations” on the data for administrators means things like backup, restore, replicate remotely. For a user this means edit, copy, rename, etc. These different views of what constitutes the “identity” of data creates fundamental conflicts in mental models. Data corruption in a file object abstraction provides isolation, so that only one file at a time appears corrupted. Data corruption in a volume abstraction can mean that every file within the volume is inaccessible. By forcing volume level constraints and behaviors on the system, we make our data overall more fragile. Volumes must be “mountable” otherwise their data is inaccessible by the filesysem or database, and unavailable to the user. The larger the volumes get, the more data is affected by these kind of failures. But we are destined to have to deal with this problem in some respects anyway, as disk drive storage capacities increase over time. Conventional restore from a backup can be costly and time consuming because whole volumes must be restored at once, whereas users often need only a specific file to be restored. Users want their files to be protected the instant they close or save them. Administrators prefer to wait until the system can be quiessed before initiating more efficient streaming block backup from one device to another. A significant contribution to the user-view of data identity has been achieved by Apple with their recent introduction of time machine. 3.4 Zero Administration Eliminating administrator chores means everything must be made either automatic or commandable by the user. If we are not to overburden the user with the requirement for the kinds of skills and training that current administrators require, then we must present a simple, consistent framework to the user within which s/he can relate to their data without the unnecessary conceptual or cognitive baggage: designed by the designer for the convenience of the designer. Example courtesy Jeff Bonwick, Sun Microsystems. Paul Borrill. “Smart Data and Wicked Problems” – TTI Vanguard. V1.2 12-Feb-2008 Page 5 of 20 These problems are not only relevant for making the life of the corporate knowledge warrior easier. Eliminating the need for administrators is essential for applications where there are no administrators, such as high or low-conflict military zones, SmallMedium Businesses, and consumers at home, where a great deal of storage is being sold10. In order to do this, we need a theory for our understanding of objects as individuals with welldefined identity conditions. Users are interested in files and directories, not disks, volumes RAIDsets, backup sets or fragile containers such as pst files. 3.5 Indiscernability The biggest simplifying assumption for a user is a single universal namespace; represented perhaps by a single portal through which all the files a user has ever created, shared, copied or deleted can be viewed and manipulated without cognitive effort, and unconstrained by system considerations. This single namespace portal must be accessible from anywhere, and behave as if the data is always local, i.e. provides the user with a local access latency experience and the file is consistently available even when the network is down. The natural way to do this is to replicate the data objects. A degenerate, but perfectly valid case (in a world of cheap disk capacity), is to replicate everything to everywhere that the user may try to access their data. A more economical method is to replicate only the most frequently used data everywhere, and pull other data when a user requests it (but this is an optimization, orthogonal to the issue of identity and individuality). In our canonical distributed data repository, the user can access a file from anywhere they may find themselves using a local PC/Client. Just point and click, drag and drop within the portal, or type some simple command containing an operator and a target file specification. As far as possible, the user needs the illusion of a single file, the fact that replication is going on behind the scenes is irrelevant to them. It shouldn’t matter whether there are three replicas of the data or three thousand, the user doesn’t need or want to know; for the user, replicas should be indiscernible. In order for a system to successfully provide this illusion, it must be able to treat all the replicas as substitutable, so that it can retrieve any one of the replicas as a valid instance of the file. For the system, replicas should be substitutable. 10 Many of us have become the family CTO/CIO at home. Even corporate CIO’s who understand the issues and use the state-of art solutions in their work environments, would prefer not to have to deal with the complexity of those solutions when they go home to their families. Thus, what constitutes the ‘principle’ of individuality is a wicked problem, the users need replicas to be indiscernible, and the system needs them to be substitutable. But wait, there’s more! 3.6 Substitutability Making replicas substitutable solves many of the primary issues related to system scalability, brittleness, and the illusion of reliability and simplicity for the user. Recovery in the wake of failures and disasters now becomes trivial because any replica can be used for recovery, and not just some fragile master copy or primary backup. The statistics of file usage are rather interesting with respect to this notion of substitutability. 90% of all files are not used after initial creation; those that are used are normally short-lived; if a file is not used in some manner the day after it is created, it will probably never be used, and only approximately 1% of all files are used daily11. This suggests that the notion of substitutability is highly exploitable in simplifying the design of Smart Data systems. For example, the replicas of write-once-read-many files such as .jpg .mp3 and most .pdf files, are static, simplifying the design considerably because we never need to distinguish one replica from another: these kinds of files naturally exhibit this property we call substitutability. However, a small fraction of files are modifiable and evolve over time. For example, by being edited by a user or updated by an application at different times, or worse, by multiple users or applications at the same time, in which case some mechanism is required to resynchronize the replicas so they can once again become substitutable. A constant cycle of specialization to distinguish replicas, followed by resynchronization to make them substitutable (or indiscernible) once again, represents a core mechanism in our distributed repository that, if we can guarantee its correctness, enables our data to appear smarter. This is discussed in more detail in the next section on persistence and change. The issue is that users wish, as far as possible, to make all replicas indiscernible, because that allows them to focus their attention on only a single entity (a singular file), treating all its replicas as one; thereby reducing the toll on their attention and cognitive energy when dealing with data. Databases present yet another identity perspective. 11 Tim Gibson, Ethan Miller, Darrell Long. “Long-term File Activity and Inter-Reference Patterns”. Proc. 24th Intl Conf. on Technology Management and Performance Evaluation of Enterprise-Wide Information Systems, Anaheim, CA, December 1998, pp 976--987. Paul Borrill. “Smart Data and Wicked Problems” – TTI Vanguard. V1.2 12-Feb-2008 Page 6 of 20 3.7 Implementation We can take this one step further, and from the perspective of our local computer, treat the replica of the file that exists on this machine as the proxy, or representative, of all the replicas there are in the entire system, no mater how many there may be. Because if we operate (change) any one of them, then the system takes care of making sure that all of them are automatically updated without having to disturb the user. From the perspective of the designer, trying to achieve this objective, and at the same time, trying to optimize system scalability, it is useful to design local data structures to maintain a knowledge of only how many remote replicas exist in the system, rather than a data structure with separate pointers to each, which require unnecessary maintenance traffic as the system creates, deletes and migrates replicas around the system. Being able to identify and consistently refer to an entity is an essential capability for system designers who design them, and administrators who manage them, but these needs are at odds with knowledge warriors who associate names with things they wish to manipulate, and need multiple namespaces only to disambiguate the same name used for different objects (usually in different work contexts). Identifying entities in different ways, and at different levels in a hierarchy may make a product easier to design, but its subsequent operation may become more difficult. Indiscernability is to the user that information hiding [Parnas] is to the programmer. 3.8 Philosophy Philosophy teaches us that this problem is already wicked, and is related to the Identity of Indescernables12. This principle states that no two distinct objects or entities can exactly resemble each other (Leibniz's Law) and is commonly understood to mean that no two objects have exactly the same properties. The Identity of Indiscernibles is of interest because it raises questions about the factors that individuate qualitatively identical objects. This problem applies to the identity of data and what it means to have many substitutable replicas, as much as it does to Quantum Mechanics13. This issue is of great importance to the complexity of humans interacting with their data, because it is unnecessary for a human to need to expend attention (or cognition) on any more than a single entity, no matter how many copies exist, as long has s/he can make the assumption that all the copies will eventually be made identical by the system. Thus, this form of (attention conserving) information has gone up and the (attention consuming) entropy has gone down. A related but independent principle, the “principle of indiscernables” states that if x is not identical to y, then there is some non-relational property P such that P holds of x and does not hold of y, or that P holds of y and does not hold of x. Equivalently, if x and y share all their non-relational properties, then x is identical to y. In contrast, the “principle of the indiscernibility of identicals” (Leibniz's Law), asserts that if x is identical to y, then every non-relational property of x is a property of y, and vice versa. In its widest and weakest form, the properties concerned include relational properties such as spatiotemporal ones and self-identity. A stronger version limits the properties to non-relational properties (i.e., qualities), and would therefore imply that there could not be, for example, two ball bearings which are exactly similar. This is resolved in our distributed repository by allowing the system to maintain metadata that is not accessible to the user, to enable the system to manage properties that may be usefully different in different locations between the cycles of specialization and resynchronization 3.9 Properties of an Individual In philosophy, there is a so-called ‘bundle’ view of individuality, according to which an individual is nothing but a bundle of properties. If all the replicas were truly indistinguishable this would lead to a violation of Leibniz’s principle of the Identity of Indiscernibles, which expressed somewhat crudely, insists that two things which are indiscernible, must be, in fact, identical. Electronically, it is possible to create many replicas that have exactly the same set of bits, and are therefore indistinguishable by looking at only the object itself through different replicas. Thus, distinguishability and individuality are conceptually distinct from the perspective of the system, but remain indiscernible to the user who prefers files to be viewable and accessible as a singular entity. There is one category of common error, in which the user may wish to distinguish versions of files so as to be able to select one of them to be “undeleted”. Versions of files represent a different (time) dimension of discernability from replicas separated in space. Each version may have a multitude of replicas throughout the system which should still remain indiscernible, even though the versions may need to be temporarily distinguishable. 12 Gottfried Leibniz, Discourse on Metaphysics. Stanford Encyclopedia of Philosophy. Identity and individuality in Quantum Mechanics. 13 As we can see, this problem is now starting to show signs of wickedness. Paul Borrill. “Smart Data and Wicked Problems” – TTI Vanguard. V1.2 12-Feb-2008 Page 7 of 20 3.10 Indistinguishability Can particles in quantum mechanics be regarded as individuals, just like tables, chairs and people? Then what about their electromagnetic fields, are they individuals too? changes and then re-writes that portion of the file back to the disk. The operation appears to the user to be irreversible: any data within the file that was deleted appears to be lost forever. According to the conventional quantum-mechanical view of physicists - they cannot: quantum particles, unlike their classical counterparts, must be regarded as indistinguishable - 'non-individuals'. However, recent work has indicated that this may not be the whole story and that some theories are consistent with the position that such particles can be taken to be individuals. Consider two computers connected by a network: computer A and computer B. Assume all modifications are made on computer A. A replication program copies the file from A to B each time the file is modified and closed. Computer B can now be considered as the “backup” of computer A. However, there may be a delay between when the file is closed on computer A, and the update is complete on computer B. This delay may be arbitrarily long if the network between the two computers is down. The metaphysical wickedness of the problem of identity and individuality applies to identity in quantum mechanics and in the identity of data in similar ways. In both cases, the history of the object must be taken into account in order to stand any chance of understanding the complete set of properties which define an individual. Just like in quantum mechanics, when the wave function collapses, history is erased, and all that exists is a single state which persists until the next event. The same is true of a file if the system doesn’t have versioning built-in at a fundamental level. But then, keeping an indefinite number of versions of a file uses up a great deal of storage, in ways that creates a great deal of unnecessary duplication. This explains the current fad for de-duplication. Users ordinarily prefer to see a file only once, no matter how many times or where the system has it stored. However, if a file is stored by the user in multiple places (say in different organizational structures of some directory or namespace), they may may prefer for it to behave as a single file (a kind of cross-reference). Whichever location we pick up the file, we can make operations on it15. However, this distinction is orthogonal to making replicas on different locations indiscernible to the user. It may be necessary to seek guidance from a user as to whether such an object is merely a copy of an existing file or an entirely new file. If quantum particles are regarded as individuals, then Leibniz’s Identity of Indiscernibles is violated. For a detailed treatment of the problem of Identity and individuality in physics see the most recent book by Steven French and Décio Krause 14. 4. Persistence & Change Consider a static file that can be changed (e.g. edited) and once again become static. The sequence of events, from the perspective of a single user on a single computer is: 1. 2. 3. 4. 5. A duration before any activity where the file F is closed (static) An event which opens the file The duration between when the file is opened, modified and closed (dynamic) An event which closes the file The file has now become F and enters another duration of inactivity (static) Consider a single computer, the file is stored once on the main disk of the computer. A simple editing program opens the file, allows the user to make A similar semantic conflict can be seen when we drag and drop a file on a modern Operating System. If the drop is to another place on the same disk, it is considered a “move”. If it is to another disk, it is considered a “copy”. By having a single portal through which all a users files may be accessed, independent of any concept of the structure of disks behind it, eliminates this semantic conflict for the user, and enables a user to organize their logical namespace without regard to the organization of the physical storage devices. What if a user created a crude “backup” by copying a folder to another disk? Would we want that backup copy to show up in the search results? 4.1 How do things persist? Are digital objects, like material objects, spread out through time just as they are spread out through space? Or is temporal persistence quite different from spatial extension? These questions lie at the heart of metaphysical exploration of the material world. Is the ship of Theseus the same ship if all the planks were replaced as they decayed over hundreds of years? Is George Washington’s axe the same axe if the shaft was replaced five times and the head twice? 14 Steven French and Décio Krause. Identity in Physics: A Historical, Philosophical, and Formal Analysis. Oxford University Press, 2006 15 Filesystem hard-links and soft-links provide functionality for “connecting” references to the same file together. Paul Borrill. “Smart Data and Wicked Problems” – TTI Vanguard. V1.2 12-Feb-2008 Page 8 of 20 4.2 Distributed Updates The conventional view of change propagation is to take “the” primary copy of data, and replicate it to a secondary copy. In a local environment this is called backup, over distance to an alternative geographic site this is called remote replication. Traditionally, there are two ways of doing this: taking a snapshot of a data set (e.g. volume) in a single indivisible action (often assisted through some copy on write (COW) scheme to minimize the time for which the primary data must be quiessed), or on a continuous basis, similar to the process of mirroring, but with a remote target. Maintaining consistency of data in such an environment is easy, providing we maintain in-order (FIFO) delivery of the updates and there are no alternate paths where packets related to some operation on an object can overtake the others. A single source to a single destination is straightforward, providing the stream of updates can be partitioned into disjoint atomic operations, which is trivial for a single source and a single destination, where the source has unique write access privileges, and the link is 100% reliable. According to Katherine Hawley17, there are three theories that attempt to account for the persistence of material objects: perdurance theory, endurance theory, and stage theory. The issues of identity and persistence in the definition of data objects may be even more complex than those in the material world: What happens when a data object has multiple digital (as opposed to material) replicas, and each of those replicas are bit for bit identical looking at the binary data in the files, but the physical blocks for each replica reside on different media, and in different logical blocks in the file systems which manage those media? Perdurance theory suggests that objects persist through time in much the same way as they extend through space. Today’s file is the same as yesterday’s file because the file exists on the surface of a disk as a magnetically encoded entity through time, part of which existed yesterday, and was 1000 words, and another part of which exists today, which is the same 1000 words, with 5 of the original words modified and 250 more added. The problem shows signs of wickedness when we combine our notion of substitutable replicas with a reliable method of change propagation. If the locking semantics of the file require it, only one user may have the file open for write access, and all other replicas of the (closed) file represent destinations. Many multicast schemes have been designed to tackle this problem. By contrast, endurance theory suggests that the way objects persist through time is very different from the way they extend through space. Objects are three-dimensional, and they persist through time by being ‘wholly present’ at each time at which they exist. Today’s file is the same as yesterday’s shorter file because it is named the same, and appears through the same namespace by which it is accessed. Just as the file is ‘wholly present’ at each of the times at which it exists, and has different properties at some of those times. However, if the file semantics allow multiple writers, then any replica may be open for write access and be a source of updates, and all others destinations. Now, updates to the file can be created in any order, so the atomicity of the updates is critical. This problem is often thought to be wicked, but is merely a Gordian knot: The core issue is to guarantee a FIFO path between all updaters (no overtaking!) and to build a mechanism that guarantees transactions will be atomic even through failure and healing of the system. The ordering problem can be expressed mathematically, using lattice theory, and the healing with the help of some old forgotten algorithms in network theory.16 The stage theory of persistence, proposed by Theodore Sider18, combines elements of the other two theories. It shares perdurance theory’s dependence on a four-dimensional framework but denies that theory’s account of predication in favor of something like the endurance theory view of predication. It is not four-dimensional (three space and one of time) objects which satisfy identity predicates like ‘is a file’ and which changes by having parts that are the same as yesterday and other parts that are different today. Instead, it is the stable intermediate stages that make up the fourdimensional objects that are files with only 1000 words vs files with 1250. 4.3 Metaphysical Considerations How do objects persist through change? In what sense is this file being edited now the same file as the one created yesterday forming an earlier draft of this paper, even though it has been edited five times, copied twice, and there are three backups on other media? Stage Theory could be a valuable model for a versioning file system that manages versions of a file as they change, like the VAX Operating System OpenVMS, with each version representing a stable “stage” of the file, which remains forever static, but the file, each time that it is edited, creates another “static” stage of the file, and all the stages are saved, representing the history of that file. 17 Katherine Hawley, How Things Persist. Thodore Sider. “Four-Dimensionalism”. Oxford University Press, 2001 18 16 Gafni & Bertsekas. Link Reversal Protocol. Paul Borrill. “Smart Data and Wicked Problems” – TTI Vanguard. V1.2 12-Feb-2008 Page 9 of 20 Stage theory’s perspective on the problem may help us design simpler, more reliable systems that preserve the consistency and reduce the cognitive overhead for users of data as it changes. The issues of persistence are somewhat orthogonal to the issues of identity. There may be an arbitrary number of replicas of each version. Indeed, we may deliberately preserve more replicas of more recent versions than older ones, which are less likely to be needed. This hints at the need for a dynamic mechanism for managing replicas, which enables competition with other dimensions of importance in an “information lifecycle” for Smart Data. However, which theory is applicable may depend on who is the “observer”. An administrator may prefer an endurance theory view, whereas a user the perdurance view. More interestingly, how do we manage the users’ desire for indiscernability when it comes to versions?: under normal circumstances, all versions should be hidden behind one object: the file. However, when a user needs access to an earlier history of a file, to recover from, for example, the accidental deletion of the file or some portion of it, how do we temporarily switch back to an endurance view so that recovery can take place? Maybe stage theory can provide an answer? The whole concept of an “observer” is a major wicked problem that delves into the very heart of Quantum Mechanics, and the theory of causality, but we don’t have time to deal with that here19. 4.4 Real Problems In the modern digital age, we address this problem somewhat less ontologically, but the complexities of managing change tend to exhibit wickedness when we try to solve real problems. A synchronization program (for example rsync) is able to synchronize directories on two machines only from one direction to another. Andrew Tridgell’s rsync20 algorithm shows a masterful perspective on the notion of identity in the context of Smart Data. Unison21 uses rsync and does the best it can to synchronize in both directions, but creates false positives which require administrative or user intervention to resolve. Harmony is an attempt to overcome these human attention overheads, by using ideas from programming languages to predefine how data can be merged automatically. Judea Pearl. Causality: Models, reasoning and inference. Cambridge University Press. 2000. 20 Andrew Tridgell. “Efficient Algorithms for Sorting and Synchronization. Ph.D Thesis, Australian National University, February 1999. 21 Unison is a file-synchronization tool which allows a collection of files and directories to be stored on different host to be modified separately, and then brought up to date by synchronizing them at a later time. http://www.stanford.edu/~pgbovine/unison_guide.htm There is no universal solution to the synchronization problem (the issue is what to do if a replica is modified differently on two sides of a partitioned network, and the results are not mergable). However, there are solutions that can be associated with specific file types and applications. In general, applications have their own file structure semantics, and these may be proprietary to the company which owns the application. Harmony provides an automated way of merging XML data, so that the need for human attention is diminished. This leads the way for a “plug-in” architecture to address the synchronization needs of each application/data type independently. This may overcome some of the Berlin wall’s of proprietary ownership, or encourage vendors to adopt public standards to enable interoperability. Either way, a plug-in architecture will help. Synchronizing replicas post facto represents only one method of propagating changes among replicas. In a distributed shared memory system at the processor/cache level, every single memory transaction is interleaved/propagated in such a way that real-time interleaving of memory locations can be achieved. This can be done for files also, but requires a more sophisticated technique than post facto synchronization, which can be considered analogous to continuous replication, as opposed to snapshot backups. Although we don’t have time to go into this here, the solutions are similar in principle, and simply require finer-grained atomic operations that can still be guaranteed in the presence of failures and healing operations on the system’s topology. This is also the basis on which distributed databases address their issues of data identity and persistence through transactions and ACID properties, which brings us to the wicked problem of “time and causality” which is the subject of the next section. Identity, persistence, and substitutability (or substitutivity) are three wicked problems which have begun to appear in data storage, but which have a distinguished philosophical history22. 19 22 Cartwright R. “Identity and Substitutivity” in M. K. Munitz (ed), “Identity and Individuation” (1971), New York University Press. Paul Borrill. “Smart Data and Wicked Problems” – TTI Vanguard. V1.2 12-Feb-2008 Page 10 of 20 5. Time & Causality 5.1 What is Time? “A measure of change” ~ Aristotle “It is impossible to mediate on time without an overwhelming emotion at the limitations of human intelligence” “A persistently stubborn illusion” ~Albert Einstein ~ Alfred North Whitehead “Beyond all day-to-day problems in physics, in the profound issues of principle that confront us today, no difficulties are more central than those associated with the concept of time” ~ John Archibald Wheeler A relationship with time is intrinsic to everything we do in creating, modifying and moving our data; yet an understanding of the concept of time among computer scientists appears far behind that of the physicists and philosophers. This state of affairs is of concern because if fundamental flaws exist in the time assumptions underlying the algorithms that govern access to and evolution of our data, then our systems will fail in unpredictable ways, and any number of undesirable characteristics may follow. According to Bill Newton-Smith26 time may be defined as a system of “temporal items” where by “temporal items” are understood to be things like instants, moments and durations. This he describes as unenlightening since, even granting its truth, we are accounting for the obscure notion of time in terms of equally if not more obscure notions of instants, moments and durations. Wicked! Julian Barbour27 describes his informal poll at the 1991 international Workshop on Time Asymmetry: The question posed to each of the 42 participants was as follows: “Do you believe that time is a truly basic concept that must appear in the foundations of any theory of the world, or is it an effective concept that can be derived from more primitive notions in the same way that a notion of temperature can be recovered in statistical mechanics?” For the majority of knowledge warriors going about their daily lives, failing to understand the subtleties of time is forgivable: Our culture and language is very deeply biased toward temporal concepts23. Almost all sentences we utter encode time in tense. Our everyday experiences incessantly affirm a temporal ordering, implying both a singular direction of temporal processes and a seductive distinction between “before” and “after”. It is not surprising, therefore, that Leslie Lamport’s seminal paper24,25 defined the “happened before” relation, and through that, a system of logical timestamps that many computer scientists use as a crutch to sweep their issues with time under the rug. Unfortunately, the notion of “happened before” is utterly meaningless unless it is intimately associated with “happened where”. Computer scientists and programmers frequently base their designs for distributed algorithms on an implicit assumption of absolute (Archimedean) time. This, plus other implicit assumptions including the concepts of continuous time and the flow of time merit closer examination; because if our data were to be elevated to a true level of “smartness”, then we should allow no excuse for designers to get it wrong. Time is the most critical area to ask hard questions of the computer scientists and programmers whose algorithms govern our relationship with our evolving data. 23 David Ruelle. The Obsessions of Time. Comm. Mathematical Physics, Volume 85, Number 1 (1982) 24 Leslie Lamport: "Time, clocks and the ordering of events in a distributed system", Communications of the ACM, Vol. 27, No. 7, pp. 558-565, July 1978 25 Rob R. Hoogerwoord Rob R. Hoogerwoord. Leslie Lamport’s Logical Clocks: a tutorial. 29-Jan-2002 The results: 20 participants said that there was no time at a fundamental level, 12 declared themselves undecided or abstained, and 10 believed time did exist at the most basic level. However, among the 12 in the undecided/ abstain category, 5 were sympathetic to or inclined to the belief that time should not appear at the most basic level of theory. Since then, many books, scientific papers & popular articles have appeared with a similar theme: that time is an emergent (or derived) property of the relationships between entities, rather than a fundamental aspect of reality. A concept referred to as the background-independent assumption of the universe (for both space and time)28,29. Some of the most compelling arguments for a reappraisal of time are set forth by Huw Price 30, a philosopher from the University of Sydney, whose penetrating clarity and ruthless logic puts many of the world’s best physicists to shame. In October 2007, Brian Greene hosted a conference at Columbia University to discuss this mystery of time. Telescope observations and new thinking about quantum gravity convinced them that it is time to re-examine time31. 26 Bill Newton-Smith The Structure of Time Google Book Julian Barbour. The End of Time. Oxford UP, 1999 28 Carlo Rovelli. Quantum Gravity. CUP, 2005 29 Lee Smolin Three roads to Quantum Gravity London/2001 30 Huw Price. Time’s Arrow & Archimedes’ Point. 1997 31 Scott Dodd. “Making Space for Time”. Scientific American, January 2008. pp 26..29 27 Paul Borrill. “Smart Data and Wicked Problems” – TTI Vanguard. V1.2 12-Feb-2008 Page 11 of 20 A conference attendee: MIT’s Max Tegmark said "we've answered classic questions about time by replacing them with other hard questions”. Wicked! As far as computers and our data objects are concerned, the ordering of events and the atomicity of operations on those objects is critical to the correctness of our algorithms and thus the behavior and reliability we can expect from our data. Distributed algorithms are difficult enough to comprehend, but if they are based on fundamental misconceptions about the nature of time, and those misconceptions can undermine, for example, the ordering of events that the algorithm depends upon, then we can no longer trust these algorithms to guard the safety or consistency of our data. Let us now consider some common misconceptions: 5.2 Simultaneity is a Myth We have heard that the notion of an absolute frame of reference in space or time is fundamentally inconsistent with the way the Universe works. In 1905 Einstein showed us that the concept of “now” is meaningless except for events occurring “here”. By the time they have been through graduate school, most physicists have the advantage of a visceral understanding that time flows necessarily at different rates for different observers from Special Relativity (SR) and General Relativity (GR). Those trained primarily in other disciplines, despite a passing acquaintance with SR, may not be so fortunate. Those with only a vague recollection of the Lorenz transformations or Minkowski space may find that insufficient to render a correct intuition when thinking about the relativity of simultaneity32. Simplistically, what this means for processes which cause changes to our data is that the same events observed from different places will be seen in a different order, or that “sets” of events originating from different sources can be arbitrarily interleaved, at potential destinations (preserving individual source stream order if no packets are lost)33. What is frequently not understood, however, is that the very nature of simultaneity, as an instant in time in one place, has no meaning whatsoever at another place in space34. Not only is our universe devoid of any notion of a global flow of time, but we cannot even synchronize our clocks, because the Rachel Scherr, Peter Shaffer, & Stamatis Vokos. “Understanding of time in special relativity: simultaneity and reference frames”. 2002. arXiv:physics/0207109v1. 33 Strictly speaking, we don’t need SR to understand this observer dependent reordering of events at a destination; we need only a concept of a finite propagation velocity of communication. SR is what makes this problem wicked. 34 Kenji Tokuo, “Logic of Simultaneity”. October 2007. arXiv:0710.1398v1 notion of an inertial frame, which might cause us to believe our clocks can theoretically run at the same rate, is infeasible due to our computers residing on the surface of a rotating sphere, in a gravitational field, orbiting a star, and connected by a long-tailed stochastic latency distribution network. If we ascribe an event a “tag” of an instant in time, then whether that specific instant is in the future or past depends on the observer. SR alone denies the possibility of universal simultaneity, and hence the possibility of the identification of a specific universal instant. GR nails this coffin shut. Fundamentally, this means that an instant of time in one computer can have no relation to another instant of time in another computer, separated by any arbitrary distance, no matter how small. In practice, we make believe that our computers are slow relative to the effects of SR and GR, that they are all in a constant state of unaccelerated motion with respect to each other, and that their communication channels can be approximated to Einstein’s light signals in the concept of an inertial frame. But this is an assumption that merits serious scrutiny. Lamport gets around this by tagging not instants of time, but events in a linearly ordered process. Unfortunately programmers do not universally understand this. But what is an event? We should get nervous any time we hear a computer scientist discuss notions of synchronous algorithms, simultaneous events, global time or absolute time35. 5.3 Time is not continuous Lets now consider the Aristotelian view that time is empirically related to change. Change is a variation or sequence of occurrences36. A sequence of occurrences in relativity is substituted by a sequence of spacetime events. An event is an idealization of a point in space and an instant in time. The concept of an instant (as well as that of duration or interval) is also wicked: Peter Lynds, suggests there is no such thing as an indivisible moment in time37. Duration is an ordered set of instants, not the sum of instants because duration is infinitely divisible into more durations, and not into an instant. According to Zeno, between any infinitesimally neighboring instants, an infinity of instants exist. 32 35 The term “absolute time” is used six times in Lamport’s paper: “Using Time Instead of Timeout for Fault-Tolerant Distributed Systems”. Leslie Lamport. ACM Transactions on Programming Languages and Systems, Vol 6, No 2, April 1984, pp 254..280. 36 Francisco S.N. Lobo. “Nature of time and causality in Physics. arXiv:0710.0428v1 37 Peter Lynds. Zeno’s Paradoxes. Instants. Paul Borrill. “Smart Data and Wicked Problems” – TTI Vanguard. V1.2 12-Feb-2008 Page 12 of 20 This gives us our first hint that time may not be a linear continuum (unless we consider that planck time ~10-43s will come to our rescue as a theoretical limit to this infinite divisibility). But then we would need to get into quantum gravity (as if this problem were not already wicked enough). 5.4 Time does not flow It appears to us that time flows: events change from being indeterminate in the future to being determinate in the past. But the passing of time cannot be clearly perceived as matter and space directly; one can perceive only irreversible physical, chemical, and biological changes in the physical space -- the space in which material objects exist. On the basis of human perception we can conclude that physical time exists only as a stream of change that runs through physical space. The important point is: Change does not "happen" in physical time -- change itself is physical time. This is different to the conventional perspective, in which space-time is the theater or "stage" on which physical change happens. Nothing in known physics corresponds to the passage of time. Indeed, many physicists insist that time doesn't flow at all; it merely is (what Julian Barbour calls Platonia). Philosophers argue that the very notion of the passage of time is nonsensical and that talk of the river or flux of time is founded on a misconception. Change without Time 38 is the only known alternative without these conflicts. But what is change? An event? So a series of events constitutes a flow which we can call time? Not quite my dear Josephine: Events reflect something that happens, in spacetime (and space is just as important as time in this context). Events represent interactions, between two (or is it more? – this is an even more wicked problem), where “something” is exchanged. Otherwise, events could “happen” all the time (forgive the pun) without any apparent consequence. There is no more evidence for the existence of anything real between one event and another than there is for an aether to support the propagation of electromagnetic waves in an empty space. Time, like space, is now believed by the majority of professional physicists to be a derived concept: an emergent property of the universe: something like the temperature or pressure of a gas in equilibrium. It is time for computer science to catch up with this revolution on physics and philosophy? 38 Johannes Simon. “Change without Time: Relationalism and Field Quantization.”. Ph.D. Dissertation. Universitat Regensburg. 2004. (Note from Paul – one of the best Ph.D. Thesis’ I have ever read!). The time we think we measure, as well as our units of distance, are based on the speed of light as a universal constant. This can be hard to understand when we define “speed” as the derivative of distance and time39. But then, the bureau of standards defines space (the metre) as “the distance traveled by light in vacuum during 1/299,792,458 of a second”. Aristotle identified time with change. For time to occur, we need a changing configuration of matter. “In an empty Universe, a hypothetical observer cannot measure time or length”40. Intuitively, like Aristotle, we verify that a notion of time emerges through an intimate relationship to change, and subjectively may be considered as something that flows in our day-to-day interactions with other humans. However, this is ludicrously imprecise when we discuss interactions of our computers with multi-GHz processors & networks. One way to usefully (for our purposes) interpret this is to recognize “events” in spacetime as the causal processes that create our reality. Setting the direction of time aside for the time being, we could call such events “interactions” because at the fundamental level, an event is a process where two things are interacting and exchanging “something”. This corresponds to the theory of Conserved Quantities (CQ)41 in the philosophical debate on causality, or the theory of Exchanged Quantities (EQ) described below. 5.5 Time has no Direction Physical time is irreversible only at macroscopic scales. Change A transforms into change B, B transforms into C and so on. When B is in existence A does not exist anymore, when C is in existence B does not exist anymore. Here physical time is understood as a stream of irreversible change that runs through physical space. Physical space itself is atemporal. Irreversible processes that capture “change” like a ‘probability ratchet’ that prevents a wheel going backwards, are the engines of our reality. They make changes persistent, even though at the most basic level, time has no intrinsic direction. Time can be formally defined in quantum mechanics with respect to its conjugate: energy. A 39 A bit of trivia – we use the word “speed” when referring to the speed of light, instead of velocity, because velocity is a vector, with some direction, whereas the speed of light has the same maximum speed in all directions. 40 Francisco S.N. Lobo. “Nature of time and causality in Physics. arXiv:0710.0428v1 41 Phil Dowe. The Conserved Quantity Theory of Causation and Chance Raising. Paul Borrill. “Smart Data and Wicked Problems” – TTI Vanguard. V1.2 12-Feb-2008 Page 13 of 20 photon with energy h (Planck’s constant) is equivalent to a once per second oscillation. In quantum mechanics, as well as classical mechanics, there is no intrinsic direction of time; all our equations are temporally symmetric. The direction of time that we perceive is based on information42 and can be seen only at macroscopic scales in the form of the second law of thermodynamics (entropy43). An assumption of monotonically increasing global time instants creates a problem for computer scientists, who must now reconcile transactions, concurrency and interactions at nanosecond scales (about one foot at the speed of light), and yet our computers already involve interactions where picoseconds are relevant, and our computers must necessarily be more than a foot apart. The reduction of the wave function is essentially time asymmetric. And although can be considered as an irreversible process at the atomic level, it does not necessarily set a “direction of time” (a distinction that, unfortunately, escaped the Nobel laureate Ilya Prigogine44). The idea that process interactions can be symmetric with respect to time is independent of the fact that sometimes an interaction may be irreversible, in the sense that all knowledge of what happened before it has been lost, and therefore the operation cannot be undone in that direction again. This idea relates closely to the concept of “transactions” on data. Atomic transactions (in the database and file system sense) therefore, are to computer scientists what irreversible processes are to chemists and biologists. They represent ratchets on the change of state. But built into the notion of transactions (the rollback) there exists a concept of time reversal. This is what makes the notions of atomicity, linearizability and serializability a wicked problem. Even the notion of an “algorithm” (derived from the same conceptual underpinning as the Turing Machine), is incomplete if we don’t at least specify a place where the time “event” occurs. But how do we specify a place, when everything is relative? Lamport did this implicitly by defining events within a “process” (by which, we assume he means within a single spatially constrained entity, even if it is in constant relative motion). But then the whole concept of a “distributed process” is flawed. What this may mean is that, we may be better off with a notion of interactive agents, which relate to 42 Information is equal to the number of yes/no answers we can get out of a system. 43 Entropy is a measure of missing information. 44 Ilya Prigogine. Time, Structure and Fluctuations. Nobel Lecture. December 1997. each other as independent and autonomous entities, responding only to locally received events. The now thoroughly discredited Archimedean view of absolute time is an attempt to preserve a “GodsEye-View” (GEV) of the processes in our world or among our computer systems. If our concept of the flow of time is fundamentally flawed, and we can no longer create a logically consistent framework from notions of instants and durations, then what can we count on? Events? 5.6 The Theory of Exchanged Quantities (EQ) Atoms interact with each other directly, through the exchange of photons. In the words of Richard Feynman “The sun atom shakes; the atom in my eye shakes eight minutes later because of a direct interaction across space”. However, from the perspective of a photon, the connection between the atoms in the sun and my eye is instantaneous – proper time for a photon is always zero! No matter how many billions of light years it travels through our universe to reach our eyes or the detectors in our telescopes. Radiation, it appears, requires both a sender and a receiver. Maybe this is an event? Events? But what are they? Maybe they are related to what John Cramer45 describes in his theory of transactional exchanges in Quantum Mechanics, which is related to our concept of exchanged quantities (EQ), described as follows: Taking the notion of transactions all the way down to the atomic level, and then reflecting that notion back up all the way to the way “interactions” work between computers begins to shed some light on a possible new way to design reliable distributed systems. By always requiring the notion of some “quantity” to be exchanged (even if it is a packet in the buffer of a sending agent for a “hole” in the buffer of a receiving agent) starts to get us the notion of a time reversible interactions in the same way that photons tie together two atoms in space. It is not hard to see that an interaction between computers requires the presence of both a sender and a receiver, just like radiation does. Exchanged, or conserved quantities would appear to be exactly what we need to deal with distributed system problems such as locks, transactions, and deadlocks. A theory of exchanged quantities represents a concrete way of modeling reality: directly applicable to data storage, by connecting computer science to current knowledge in physics and philosophy. 45 John G Cramer. The Transactional Interpretation of Quantum Mechanics. Physical Review 22, (1980) Paul Borrill. “Smart Data and Wicked Problems” – TTI Vanguard. V1.2 12-Feb-2008 Page 14 of 20 5.7 Consistency & In-order Delivery Causal ordering of messages plays an essential role both in ensuring the consistency of data structures and unlocking the inherent parallelism and availability in a distributed system. Ensuring causal ordering without an acyclic communication substrate has a long and checkered history. Variations on Lamport’s logical timestamps are the most frequently proposed mechanism to address this issue. The fundamental problem is frequently described as “maintaining causal order” of messages. Logical timestamps attempt to do this by tagging each relevant event in a system with a monotonically increasing variable or a vector of perprocess variables, and comparing these variables when messages are received in-order, to detect “causal violations” when they have occurred, but before they can affect the consistency of data. Much of the theory and mechanism for logical timestamps is common to version vectors used when replicated file systems are partitioned. The idea behind logical timestamps initially appears reasonable, but every implementation so far has proven to be intolerably complex when applied to real systems that scale and can fail arbitrarily46,47. Timestamp variables overflow and every method attempted so far to resolve this issue simply adds yet more wicked complexity. A single logical (scalar) timestamp imposes a total order on the distributed system that severely constrains the available concurrency (and hence scalability) of the system. Vector timestamps provide a preconfigured per-process vector which constrains the dynamic nature of the system. Transactions cannot be undone (backed out) unless a history of timestamps is maintained by each node and transmitted with each message (matrix timestamps), the overhead of which is so huge that this scheme is rarely used except perhaps in simulations, where the system under test can be severely bounded. Logical timestamps represent an attempt to resynthesize a God’s Eye View (GEV) of “logical time”. Our experience is that they present intolerable complexities in the deployment and brittleness of scalable storage systems. Just as wait-free programming48 represents an alternative to conventional locks (which are prone to loss and deadlock), the ordering of events can be articulated much more simply in the network, rather than attempting to recreate a sense of logical global time at the message senders. The figure above shows how they work. Three processes (P,Q,R) are distributed in space, and communicate by sending messages to each other over their respective communication channels (indicated by the diagonal lines, the angle of which vaguely represent the speed of transmission). Each Process marks its own events with a monotonically increasing integer (the logical timestamp), and keeps track of the messages received from the other processes, who conveniently append their own logical timestamps. In this way, each process can ascertain what state the other processes are in based on the messages they receive. The figure shows clearly that the causal history, and future effect, for the two events 32, and 24, represent distinctly different perspectives of what is going on. These problems with time and concurrency go far beyond distributed storage. The processor industry is in a crisis: Multicore processors have become necessary because of the clock frequency wall related to power dissipation. Now we have reached the point that we “have to” go multi-core, and nobody’s thought about the software49. Computer scientists have failed to produce the theoretical models for time and concurrency, as well as the software tools needed to use these multiple cores. There hasn’t been a breakthrough idea in parallel programming for a long time. We are now going to have to invest in the research on new models for time and concurrency that we should have 10 years ago in order to be able to utilize those processors. This problem only gets worse as we have more cores per processor. 46 Reinhard Schwarz, Friedemann Mattern. “Detecting Causal Relationships in Distributed Computations: In Search of the Holy Grail”. Distributed Computing. Vol 7, No 3 (1994). pp 149..174. 47 David R Cherition, Dale Skeen. “Understanidng the Limitations of Causally and Totally Ordered Communication. 48 Maurice Herlihy. Wait-free synchronization. ACM Transactions on Programming Languages and Systems, 13(1):124--149, January 1991 49 John Hennessy, Keynote speech to the CTO Forum. Cisco, November 2007. Paul Borrill. “Smart Data and Wicked Problems” – TTI Vanguard. V1.2 12-Feb-2008 Page 15 of 20 5.8 Causality I wish, first, to maintain that the word “causality” is so inextricably bound up with misleading associations as to make its complete extrusion from the Computer Science vocabulary desirable; secondly, to inquire what principle, if any, is employed in physics in place of the supposed “law of causality” which computer scientists imagine to be employed, to exhibit certain confusions, specially in regard to teleology and determinism, which appear to me to be connected with erroneous notions as to causality. Computer Scientists, imagine that causation is one of the fundamental axioms or postulates of physics, yet, oddly enough, in real scientific disciplines such as special and general relativity, and quantum mechanics, the word “cause” never occurs. To me it seems that computer science ought not to assume such legislative functions, and that the reason why physics has ceased to look for causes is that in fact there are no such things. The law of causality, I believe, like much that passes muster among computer scientists, is a relic of a bygone age, surviving, like a belief in God50, only because it is erroneously supposed to do no harm. ~Paul Borrill (with apologies to Bertrand Russell) The conventional notion of causality is: given any two events, A and B, there are three possibilities: either A is a cause of B, or B is a cause of A or neither is a cause of the other. This “traditional” distinction within the notion of causality is built into Lamport’s happened before relation. In philosophy, as in the management of data, cause and effect are twin pillars on which much of our thought seems based. But almost a century ago, Bertrand Russell declared that modern physics leaves these pillars without foundations. Russell's revolutionary conclusion was that 'the law of causality is a relic of a bygone age, surviving, like the monarchy, only because it is erroneously supposed to do no harm51. Russell's famous challenge remains unanswered. Despite dramatic advances in physics, this past century has taken us no closer to an explanation of how to find a place for causation in a world of the kind that physics or computer science reveals. In particular, we still have no satisfactory account of the directionality of causation - the difference between cause and effect, and the fact that causes typically precede their effects. This would appear to be an aspect of the reversibility of time discussed in the previous section. 50 Richard Dawkins. The God Delusion. 51 Bertrand Russell. “On the Notion of Cause”. In Russell on Metaphysics. Stephen Mumford. Routledge. 2003. For computer scientists, a sequence of events has a determined temporal order: events are triggered by causes, thus providing us with the notion of causality. The causality principle is often defined as: “every effect must have a proximate, antecedent cause”. While this simple comment seems eminently sensible, and this style of thinking regarding causal processes is widespread in the computer science community, the fields of physics and philosophy take a very different view of the validity of the whole concept of causality52. If conceptual difficulties are known to exist with the notion of causality in other fields, then should we not pay a little more attention to this in the assumptions underlying the design of our data systems and our algorithms? 6. Curse of the God's Eye View “All behind one pane of glass” is a frequent mantra of IT people trying to manage the complexity of their data systems: A metaphor for a single workstation view of an entire operation (datacenter, or Network Operations Center), with everything viewable, or accessible, a few clicks away, drilling down through remote control into some part of the system to yield information that an expert administrator could immediately pattern match the problem to diagnose and prescribe the solution to. “One throat to choke” is another: reflecting the desire to avoid dealing with vendor finger pointing when systems fail to interoperate, or data becomes lost or corrupted. This perspective is easy to understand in an industry perspective which anyone who has perceived the value of standardization of protocols or interfaces, or even operational procedures intuitively understands These very human intuitions are understandable from the perspective of harried CIO’s, IT managers and even ourselves as knowledge warriors, “trying to do their job”, but they inevitably lead to failure. Is it possible that the God’s Eye View provides us with too much information? And that the system interactions in trying to maintain that information actively interferes with the architectural processes which create subtle but essential behavioral robustness of systems? Our intuition fails us when we try to solve what appear to be immediate problems in front of us. Our unconscious beliefs, and our impatience (often driven by the downward causation of quarterly profits, or Investor “time to return”), pressure us into sweeping issues under the rug, where we forget Huw Price, Richard Corry. “Causation, Physics, and the Constitution Of Reality”, Clarendon Press – Oxford, 2007 52 Paul Borrill. “Smart Data and Wicked Problems” – TTI Vanguard. V1.2 12-Feb-2008 Page 16 of 20 about them. After a while, the lumps under the rug start to become minor mounds, and then termite hills, and eventually a mountain range). These lumps under the rug are our individual and our collective wicked problems. They are our first dim glimpses of awareness of the next series of problems that we need to address as individuals, organizations, an industry, a society or a civilization, in order to move to the next level of our collective consciousness. Sooner or later, in order to make progress (instead of the inevitable Albert Caymus’ Myth of Sisyphus), we must pull these problems back from under the rug, and work on their wickedness. Today, we have discussed a small, and hopefully manageable set of wicked problems, to be solved in order to address our prime directive of reliving human beings from their slavery to their data. Richard Dawkins, in his 2005 TED talk at Oxford illustrated the problem of how our beliefs and mental models blind us to the truth: We are now so used to the idea that the Earth spins rather than the Sun moves across the sky. It's hard for us to realize what a shattering mental revolution that must have been. After all it seems obvious that the Earth is large and motionless, the Sun small and mobile. But it is worth recalling Wittgenstein's remark on the subject. "Tell me", he asked a friend, "Why do people always say it was natural for man to assume that the Sun went around the Earth rather than that the Earth was rotating?" His friend replied, "Well obviously because it just looks as though the Sun is going around the Earth!". Wittgenstein replied ... "Well what would it have looked like if it had looked as though the Earth was rotating?" So what would it look like, if we were not able to reach out as designers, and to be unable to directly control things as the number, connectivity and diversity of things scale? Well maybe it would look exactly like what we are experiencing now: the indefinable complexity of wicked problems, rising like a termite hills under the carpet, and causing us to develop our Yak shaving skills. We are evolved denizens of a local environment that we perceive. Our primitive mammalian brains, selected over millennia, are trained to reach out and touch, to chase after prey, to run away from danger, to grasp our tools or our food. This notion of locality and spatial perception is deeply imbedded into our perspective of the world. We can even reach out through our newspapers, to other events in our world, and our screens, that are but a few feet away, connect us with the lives of others on the other side of the world. But this sense of humans being able to control everything in our environment, just because we can see it, is deeply flawed, it just doesn’t scale. I call it the curse of the God’s Eye View. Just because our God-like design ego would like to see everything, doesn’t mean that we should, and even if we can see something, it doesn’t mean we should try to control them from that same GEV perspective. The inductive transition from n to n+1 as systems scale, will seduce into believing that what works at small scale will work at larger scales. It simply isn’t true. Ultimately, the God’s Eye View (GEV) represents an Archimedean perspective: an attempt to gain visibility to everything as if time and space were a backdrop canvas for a theater on which we can draw our Newtonian view of how our universe “should” work. Are designers the enemy of design? 5.8 Reduction to Practice He who loves practice without theory is like the sailor who boards ship without a rudder and compass and never knows where he may cast. ~Leonardo da Vinci As we extend to more and more cores in our processors, and more nodes in our distributed systems, the problems of time, causality and concurrent programming models become more acute. There are some models that do show some promise for some insight to this problem53. When they are combined with a relativistic view of time and space: a breakthrough may be forthcoming. This is an area of challenge for computer scientists. Until they give up their naïve models about time, and their attempts to recreate a GEV, little progress will be made. Designers are the enemy of design, especially when they try to play God. Turing machines and Algorithms must completely specify all inputs before they start computing, while interaction machines54 can add actions occurring during the course of the computation. Is this shift in perspective from the “Gods-EyeView” of algorithms, to this “neighbor to neighbor” interactive view of distributed computation the shift we need to more correctly model our notions of time in a distributed computing environment? Another major concept, which goes along with the notion of Interactive computation, is SelfStabilization. 53 Maurice Herlihy. Nir Shavit. The Topological Structure of Asynchronous Computability. Journal of the ACM, November 1999. Also see the forthcoming Textbook: "The Art of Multiprocessor Programming" by Maurice Herlihy and Nir Shavit, Morgan-Kaufmann Elsevier, March 2008. 54 Interactive Computation: The New Paradigm. Edited by Dina Goldin, Scott Smolka, and Peter Wenger. Paul Borrill. “Smart Data and Wicked Problems” – TTI Vanguard. V1.2 12-Feb-2008 Page 17 of 20 Self Stabilization Is a halfway house, an insight described by Dijkstra in 1974 when he began to recognize the limitations of the algorithmic paradigm based on Turing machines. Not much was done with this at the time, but Lamport’s later praise of the concept fed life into it, and it has now become an active, if not major, area of research. An extension of the concept of self-stabilization is that of superstabilization [Dolev and Herman 1997] The intent here is to cope with dynamic distributed systems that undergo topological changes with an additional passage predicate in addition to the conditions for self-stabilization. In classical selfstabilization theory, environmental events that cause (for example) topology changes are viewed as errors where no guarantees are given until the system has stabilized again. With superstabilizing systems, a “passage predicate” is required to be satisfied during the reconfiguration of the underlying topology. Wenger argues that Interaction Computing is more powerful than algorithms. Interaction as a distributed systems concept ties in with EQ and CQ notions described previously. 7. Conclusions Throughout Computer Science, we see a culturally embedded concept of Archimedean time. The idea that there is some kind of master clock in the sky, and that time flows somehow independently of the interactions between our atoms and molecules, or our computer systems as they communicate. We now know with some certainty that without the intense mathematical training of a theoretical physicist, our intuition will fail us. upon our attention, as we unconsciously but not inevitably become slaves to our data. When I began to write this paper, I had in mind that the prime directive would be of the form: Smart data looks after itself, or Thou shalt not cause a human being to do work that a machine can do However, the issue of human beings steadily becoming slaves to machines was felt to be of such paramount importance to our future relationship with our data, that we needed something a little stronger. Taking inspiration from Asimov on our cultural awareness of our relationship to machines, I felt it better to express the proposed prime directive for Smart Data in the form of the three laws which we presented in the introduction56: The intent of these laws is clear: our technology abundances, whether in the form of CPU cycles, network bandwidth, or storage capacity, should be used to preserve the scarce resources, such as human attention, and latency57. 7.2 Six Principles of Smart Data The following six principles are the result of insights obtained when thinking through and modeling the design of a 100PB Distributed File Repository. 1. We conjecture that the majority of the complexity of storage systems (and much of the difficulty in building scalable, reliable, distributed systems), is due to the failure of designers to resist the idea that they are God, when they design their systems. 2. God’s Eye View solutions are a fantasy. The only world that nature builds is through self-organizing behavior55: simple rules, near-neighbor interactions (whether those interactions are by contact between electromagnetic fields, or photons across space). 3. 7.1 The Prime Directive Every knowledge warrior knows that we are, as yet, far from the singularity of our data becoming smarter than us. But the writing is on the wall: if we are not to progressively become tools of our tools, then we must address the tax necessarily imposed 55 Autopoietic File Systems: From Architectural Theory to Practical Implications. Paul L. Borrill, Vanguard Conference, San Francisco, January 2005 The system shall forsake any God (a single coordinator - human or otherwise) that can fail and bring down the whole system, or prevent the system from returning itself to fully operational state after perturbations due to failures, disasters or attacks. Thou shalt use only a relative time assumption in any aspect of the design of a smart data system or its distributed algorithms58. Each storage agent, in conjunction with its neighbors, will do everything within its power to preserve data. If choices must be made, then data shall be conserved according to the following priority classes: Class 1 – Not important to operations Class 2 – Important for Productivity Class 3 – Business Important Information Class 4 – Business Vital Information Class 5 – Mission Critical Information 56 See Roger Clarke’s overview on Asimov’s three laws of robotics, or the description in Wikipedia 57 While bandwidth continues to improve, latency remains a constant due to the finite speed of light 58 Absolute (Archimedean) time is depreciated, and should be banished from our designs, and our thinking Paul Borrill. “Smart Data and Wicked Problems” – TTI Vanguard. V1.2 12-Feb-2008 Page 18 of 20 4. 5. 6. 7. For users, all replicas of a file shall remain indiscernible; all versions shall remain indiscernible, until s/he needs to undelete For systems: replicas shall be substitutable Storage agents are individuals; don’t try to put their brains hearts and intestines in different places. The designer has no right to be wrong. The motivation for these laws and principles is not only to make our lives as knowledge warriors more satisfying and productive, but to encourage the elimination of a whole set of administrative chores, so that those wonderful human beings called systems and storage administers, who so valiantly try to make our lives as knowledge warriors easier, can move on to new roles in life involving variety and initiative. The Archimedean or GEV perspective does not relieve complexity in design, it causes it. 7.3 Call to Action To computer scientists, architects, programmers and entrepreneurs, to lift up their heads from their keyboards, and look more broadly to other disciplines for inspiration and ideas on how to get us out of the current rut they have gotten us into with the intensely-wasteful-of-human-attention ways conventional storage systems are designed. Look more broadly to other disciplines for inspiration and insight into how nature works, and model your systems accordingly. Be acutely aware of assumptions about time, root out hidden assumptions of simultaneity, treat everything as interactions with conserved or exchanged quantities (with nothing but a void in between), and respect that time can (sometimes) go in reverse. If we truly wish our data to be smart, and to be able to rely on it going forward, then the designers of our systems have no right to be wrong. We have the right to expect our bridges remain standing during a storm, or our airplanes to land safely even though faults may occur in flight, and our smart data to be there when and where we need it, and for it not to tax our attention when we don’t. Let the music begin. Vanguard Feb 20-21, 2008: SMART(ER) DATA 8. References Articles / Papers: 1. 2. 3. 4. 5. 6. 7. 8. 9. Economist Magazine: Make It Simple. Information Technology Survey. October 2004. Scott Dodd. “Making Space For Time”. Scientific American, January 2008. Pp-2629. Peter Denning. “The Choice Uncertainty Principle. Communications of the ACM, November 2007, pp 9-14. [Ellis 2006]. GFR Ellis Physics in the real universe; time and space time, “Gem Re; Grav. 39. 1797, 2006. Rachel Scherr, Peter Shaffer, & Stamatis Vokos, “Student understanding of time in special relativity: simultaneity and reference frames”. arXiv:physics/0207109v1 Kenji Tokuo, “Logic of Simultaneity”. arXiv:0710.1398v1 Francisco S.N. Tobo. “Nature Of Time And Causality In Physics”. 2007 Donna J. Peuquet, “Making Space For Time: Issues In Space-Time Data Representation”. GeoInformatica. Volume 5, Number 1, March 2001. David Ruelle, The Obsessions of Time. Communications in Mathematical Physics, Volume 85, Number 1 (1982) Books: 10. Gottfried Leibniz, Discourse On Metaphysics And Other Essays. 11. Steven French and Decio Krause, Identity in Physics: A Historical, Philosophical, and Formal Analysis. Oxford University Press, 2006 12. Leslie Lamport: “Time, clocks, and the ordering of events in a distributed system”, Communications of the ACM, Vol. 27, pp.558-565, July 1978 13. Bill Newton-Smith, The Structure of Time, Routledge & Kegan Books Ltd, October 1984. 14. Huw Price and Richard Corry, Causation, Physics, and the Constitution Of Reality, Clarendon Press – Oxford, 2007 15. Julian Barbour, The End of Time, Oxford University Press Inc., 1999 16. Lee Smolin, Three Roads To Quantum Gravity, Basic Books, 2001 17. Huw Price. Time’s Arrow & Archimedes’ Point: New Directions for the Physics of Time. 18. Dina Goldin, Scoot A. Smolka, Peter Wegner (Eds.). Interactive Computation, The New Paradigm. Springer, 2006. Paul Borrill. “Smart Data and Wicked Problems” – TTI Vanguard. V1.2 12-Feb-2008 Page 19 of 20 Additional Reading: 19. Victor J. Stenger, Ph.D., Timeless Reality Symmetry Simplicity and Multiple Universes. Prometheus Books, 2000. 20. Murray Gell-mann, The Quark and the Jaguar, Henry Holt & Company LLC., 1994 21. Hyman Bass and Alexander Lubotzky, Tree Lattices (Progress In Mathematics), Birkhauser Boston, 2001 22. Hans Reichenbach, The Direction Of Time, Dover Productions, 1999 23. Davide Sangiorgi and David Walker, The PI-Calculus: A Theory Of Mobile Processes, Cambridge University Press, 2001 24. Peter Atkins, Four Laws That Drive The Universe. Oxford University Press, 2007. 25. Robin Milner, Communicating And Mobile Systems: The PI-Calculus., Cambridge University Press, 1999 26. George Gratzer, Lattice Theory: First Concepts and Distributed Lattices, W.H. Freedman & Company, 1971 27. B.A. Davey and H.A. Davey, Introduction to Lattices and Order Second Edition, Cambridge University Press, 1990 28. Neil Gershenfeld, The Physics Of Information Technology, Cambridge University Press, 2000 29. Judea Pearl, Causality: Models, Reasoning, and Interference, Cambridge University Press, 2000 30. Lisa M. Dolling, Arthur F. Gianelli, Glenn N. Statile, Editors. The Tests Of Time: Readings In The Development Of Physical Theory, Princeton University Press, 2003 31. Stephen Hawking, A Stubbornly Persistent Illusion: The Essential Scientific Works Of Albert Einstein, Edited with Commentary by Running Press Book Publishers, 2007 32. Mark Burgess, Principals Of Network And Systems Administration, John Wiley & Sons, LTD., 2000 33. Paul Davies, About Time: Einstein’s Unfinished Revolution, Orion Productions, 1995 34. Matthew Hennessey, A Distributed PICalculus, Cambridge University Press, 2007 35. Nancy A. Lynch, Distributed Algorithms (The Morgan Kaufmann Series In Data Management Systems), Morgan Kaufmann Publishers, Inc., 1996. 36. Autopoietic File Systems: From Architectural Theory to Practical Implications. Paul L. Borrill, Vanguard Conference, San Francisco, January 2005. 37. Carlo Rovelli. Quantum Gravity. Cambridge University Press, 2005 Paul Borrill. “Smart Data and Wicked Problems” – TTI Vanguard. V1.2 12-Feb-2008 Page 20 of 20