OXFORD UNIVERSITY COMPUTING LABORATORY Trusted Computing for Trusted Provenance Andrew Martin Provenance and Security Workshop eSI May 2009 Outline My motivating example Trusted Computing Impact for Provenance Page 2 My motivating example climateprediction.net climate modelling has generally been the preserve of supercomputers climate models have many uncertainties previously: one PhD ≈five model runs – new strategy: – – – (different parameters, start conditions) recruit home PC users each PC runs a complete climate model in approx. six weeks (elapsed) Monte Carlo simulation to find the most likely outcomes BOINC-based platform: much in common with SETI@home, distributed.net, etc. and many differences Page 3 climateprediction.net security Classical dual problem in distributed computing: trusted host, untrusted code – the wise user will ask: – does this software do what it claims? – does it interfere (in installation, in operation) with the rest of my computing experience? trusted code, untrusted host – did the remote user really run the intended software, with the intended parameters? – are the results returned untainted, accurate, repeatable? Page 4 trusted host, untrusted code code-signing can help – – sandboxing would help – but tends to degrade performance – no one wants to compile Fortran to bytecode! operating system can give some isolation/protection – I was nervous when my name was on the certificate! doesn't stop bad things from happening – just gives you chance to trust the provider but we did need a system which ran no matter who was loggedon but behaviour of the main executable is one thing; what can we say about the behaviour of the installation program? – we installed extra trusted root certificates, silently, for example Page 5 trusted code, untrusted host any integrity protection measures built into the client software can be spoofed by the user one response is to duplicate the task this is what SETI@home does: each work-unit is sent to up to twelve participants, and results are compared – would reduce substantially the number of model runs we could complete – the nature of the task means that we do not expect binary equivalence of results anyway – in general, this is a hard problem! could trusted computing help? Page 6 climateprediction.net security approach Biggest risks: Î well-meaning attempts to ‘improve’ code Î trying to ‘game’ the leader board Î politically-motivated results perturbation Î loss of confidence by participants; abandon their model runs This led us to: try to avoid an ‘arms race’: don’t make the security too interesting protect integrity with a simple hash use statistical measures to estimate the amount of cheating taking place; throw away (or repeat) anomalous results use code-signing and our reputation to motivate participants Outcomes: largely successful; one surprising incident... Page 7 Scope for malware Realistic threat assessment? – these things certainly exist – likelihood of attack is broadly in line with value of data – need to consider “insider attack” as well as “outsider malice” – could debate “reasonable protection” at length application software middleware OS firmware hardware Page 8 Scope for malware General-purpose viruses are unlikely to be a problem for provenance. But if anyone has an interest in falsifying/modifying provenance data, they have ample opportunity. Self-asserted provenance data is unguaranteed: even if the “right” application software is in use. application software middleware OS firmware hardware Page 9 How far does this extend? This kind of analysis seems to extend to many kinds of system: – particularly bad in “public participation” projects – but what about Condor pools? – are clusters trustworthy? – are instruments tamper-proof? – etc. Security concerns are always subject to moving goal-posts: – if the would-be fraudsters are prevented from using one means of attack, they will likely move on to the next. Page 10 Trusted Systems Trust: expectation of behaviour Trusted systems are those upon whose correct (or predictable) operation we simply rely. If they fail to live up to our expectations, bad consequences will follow. Careful speakers distinguish – trusted systems – trustworthy systems — we could have either without it being the other. Page 11 Trusted Computing It is safe to trust something when: 1. 2. 3. it can be unambiguously identified, and it operates unhindered, and the user has first-hand experience of consistent, good, behaviour or the user trusts someone who vouches for consistent, good, behaviour. “An entity can be trusted if it always behaves in the expected manner for the intended purpose.” (TCG 2004 ) Page 12 Trusted Computing, Trusted Infrastructure For us, then, Trusted Computing means an approach to building computer systems which … 1. 2. 3. strongly identify themselves strongly identify their current configuration/running software allow us to make informed decisions about the level of trust to invest in them. platform identity will be based on public-key cryptography software identity will be based on cryptographic hashes of program object code we want to gain maximum benefit from the hard-to-alter characteristics of hardware Trustworthy state for a PC platform essence is to take a measurement (a cryptographic hash) of each component which contributes to the platform state – firmware, kernel, library, application binary, configuration file, etc. use those measurements as the basis of deciding if the platform is in a state I trust – the same state as last time it booted – using components without known vulnerabilities – etc. Trustworthy state for a PC platform what’s wrong with each component recording its own is my application untainted? hash? application is the environment in which it runs software untainted? middleware OS firmware hardware Trustworthy state for a PC platform concept is to have each component in the chain be measured by the preceding one application software middleware OS firmware transfer control measure TPM store hardware Attesting the platform state TPM holds cryptographic keys. Can sign statements about the measurements it stores – assert platform state to third party Can seal data (encryption function bound to platform state) so that it is available only when the platform is in a particular state. Page 17 Remarks this process gives us a measured boot process – any component in the chain can gain confidence about the components below it by querying the TPM – implies transitive trust in components remote attestation allows the platform to report this process to a third party considerable complexity: around 200 measurements in a typical boot to a generalpurpose operating system alternative architecture uses a late launched measured environment Trusted Infrastructure Trusted: – virtualization o attest the VMM and the VM of interest – networking – storage – mobile devices – virtual domains Page 19 Trusted Infrastructure State-of-the art: – TPMs are widely deployed – limited OS support – various kinds of language support – mostly lowlevel capabilities – trusted networking and storage devices being shipped – prototypes of application-level stuff Page 20 Speculating about Trusted Provenance My perspective – one aspect of provenance entails assuring that an accurate record is made of what software was used to make or process a datum – doing so soundly requires knowledge of the context of execution, too – trusted computing is designed to do precisely this o in a tamper-proof way Page 21 Secure Logging Infrastructure Jun Ho Huh and Andrew Martin (2008) Research Questions whether attestation data can be made into useful provenance information how to incorporate that data – and its signature, etc. – into the right kind of metadata how to build software architectures which use trusted infrastructure to add value to provenance data (for real, not illusory benefit) – maybe using tamper-proof VMs – maybe using trusted instances of JavaVM or the CLR ... Page 23