Washed by the Very Same Rain: System Administration Research Alva L. Couch Tufts University couch@cs.tufts.edu Part I What is research? Who am I? • Overseer of LISA: chair of steering committee, board liaison (since 2005) • 14 LISA papers since 1996 (+ 2 students who submitted sole-author papers) • 2 LISA “best paper” awards and 1 “best student paper” award since 1996. • 2003 SAGE Professional Service Award (with Mark Burgess and Paul Anderson). What is research? • We all think we know, but • popular accounts of the nature of research are misleading, and • remain misleading throughout recorded history! A popular misconception • Einstein created the theory of relativity out of thin air. • No one else could have done it but Einstein. • Not! • Einstein’s work began in a context of what was already known, and • Several mathematicians (notably Minkowski) were working in the same context and concurrently trying to come up with their own explanations! A song and a woven cloth • This presentation is structured like the folksong “same rain” by American folk singer Pat Humphries, which has been considered by some as a paradigm for research and exploration. • It is a cloth woven from threads inspired by my wife’s mentor Prof. Philip Morrison (of MIT) and his TV series “The Ring of Truth”, which discusses how scientists develop their ideas. “We're all living in a great big dipper…” • All research occurs in a context, that includes – What work has been done before. – What community is interested. – What problems remain to be solved. • Context is a moving target that can change rapidly over time. “We’re all washed by the very same rain…” • By definition, research doesn’t occur in a vacuum. • If you see something important, chances are that a number of other people have seen the same thing. • Difference is whether you do something about understanding what you see! • Edison: 1% inspiration, 99% perspiration. “We are swimming in the stream together…” • Research is not about working alone, but rather about communicating ideas to a community that is exploring similar directions. • Most important step is to identify your community (or communities). • “Who are you swimming with?” “Some in power and some in pain…” • • • • • Failure is a crucial part of research. One’s hypothesis can be invalid. Even after one has believed it for years. Only by failing can one learn. Only by being open to failure can one become objective. • I have more wrong ideas than right ones! The usual formula for how to do research • • • • • • • • Determine context of the problem. Survey proposed solutions. Determine new directions to explore. Choose one direction to explore. Develop a hypothesis about the direction. Test that hypothesis. Evaluate the results of the test. Refine the hypothesis, and repeat. Key elements of the formula • Context: maintaining an idea of what you know and don’t know about a problem. • History: keeping track of what you learn over time. • Evidence: how what you see supports or refutes what you might think. • Conversation: the ability to explain what you see to others. An alternative formula • • • • Get excited about something. Commit to learning all that can be understood about it. Choose some small part of it to understand better. Write down your specific ideas about the nature of this part. This is your “hypothesis”. • Test your understanding with observation. This is your “experiment”. • Remain doubtful of unconvincing evidence, and curious about contradictory evidence. • Refine yourself and then repeat! Research versus learning • Too often, research is mischaracterized as a discovery product, like finding a piece of gold in a gold mine. • Most research is instead a learning process, where you learn something new about something you already see. • The gold is not what you see, but what you learn. Research redefined • An active learning process… • In which you explore what happens, and learn from the world… • In a continuing conversation with a community of learning… • In a changing and evolving context of observed phenomena and human needs… • In which one risks being wrong, but learns and evolves from one’s mistakes. The Ring of Truth • My wife was the researcher for the TV series “The Ring of Truth”, which discusses the nature of science. • Each show concentrates on some aspect of the scientific method: Looking, Change, Mapping, Clues, Atoms, and Doubt. • Let’s map these ideas into system administration terms! Looking • The ability to look at something familiar and see something new. • Burch and Cheswick,Tracing Anonymous Packets to Their Approximate Source, Proc. LISA 2000. • A denial of service (DoS) attack is not always a bad thing, and one can use a structured DoS to identify perpetrators of other DoS’s! Change • The ability to embrace the idea that one’s understanding of the world – and the world – changes and improves over time. • Finke, Manage People, Not Userids, Proc. LISA 2005. • A revisitation of the same author’s previous paper on the subject, in which he explains how his understanding and practice improved over time and reversed some prior decisions. Mapping • The ability to use models and abstraction to understand the world. • Couch, Wu, and Susanto, Toward a cost model for system administration, Proc. LISA 2005. • A model of cost for helpdesks shows through simulation that helpdesks running near the limit of staff capacity experience chaotic changes in total value. Clues • The ability to look for and see clues toward new and different explanations of phenomena. • Gross and Rosson, Looking for Trouble: Understanding End-User Security Management, Proc. CHIMIT 2007. • The windows firewall message “do you want to allow this connection” is semantically equivalent – in the minds of most users – with “do you want to get your work done or not?” Atoms • The ability to come to grips with what is knowable and what is unknowable. • Burgess, Computer Immunology, Proc. LISA 1998. • Centralized control systems depend upon “knowing the unknowable,” whereas physical systems such as the human body depend upon distributed and “more knowable” notions. Doubt • The ability to face and embrace one’s lack of understanding of complex phenomena. • Evard, An Analysis of UNIX System Configuration, Proc. LISA 1997. • Configuration management is often conceptualized as a simple choice between tools, but involves a more complex conflict between technical methods and human needs. Part II Steps toward engaging in research Parts of becoming a researcher • Engaging in active learning. • Being open to doubt. • Finding and maintaining context. Aids to effective learning • Keeping a personal journal of ideas, directions, hypotheses, experiments, conclusions, references. • Breadth: documenting every idea you get. • Depth: exploring one new direction at a time. • Documenting each hypothesis and the evidence for and against it as soon as possible. Persistence of memory? • Don’t rely on your memory, no matter how good it is. • Your understanding of the problem is a moving target. • To teach other people what you learned, you need to recall what you didn’t know before! Example: my journal • Dated entries describe hypotheses, tests, results, ideas. • In electronic form (plaintext). • Ideas often turn out to be wrong. • I never delete or edit an entry! • This is not a publication; it is a starting point for one. • It is more important to have a record than to be correct. Being open to doubt • Doing research is about accepting that absolutely any idea you write down is – subject to continual validation and – can turn out to be invalid at any time in the future. • Each entry in the journal is a starting point for discussion, and not a fact. • In mine, the “invalidated” entries outnumber the “validated” ones. Finding context and community • Several resources can aid you in beginning: – The Anderson taxonomy of system administration topics. Anderson and Patterson, “A Retrospective on Twelve Years of LISA Proceedings”, Proc. LISA 1999. – Book: Selected Papers in Network and System Administration (based upon the Anderson Taxonomy). – Book: Handbook of Network and System Administration (beyond the Anderson taxonomy). – USENIX compendium of best papers (a testament to the “most interesting” topics and approaches). • Google can help, but only if you already know the proper keywords! Just as important: find community • Your community: the people in this room. • One often chooses a problem “for a community” rather than the other way around. Essential skills of the researcher • • • • Focused reading Documenting biases. Collecting evidence. Being open to surprises. Focused reading • A researcher doesn’t read a paper like a regular person. • Reading occurs in a context. • To answer specific questions. The typical questions • Relevance: is this work relevant to what I want to understand? • Context: where did their understanding start (when their work began)? • Results: where did their understanding end (when they finished this paper)? • Doubt: what unknowns did they find? Questions evolve! • These are just a starting point. • As you focus upon a topic, reading becomes more focused as well. • E.g., “Is this relevant” becomes a question about a specific kind of relevance. Part III Examples …(Ahem)… • The original idea for this talk was to describe the whole “landscape” of system administration research and where things are today. • I thought about this a bit and decided that it was too broad an objective. • And it sounded a bit boring. • So instead, I am going to show you several examples of how to build your own landscape of what’s important to you. • And then, I’ll take requests! How to build your own landscape • Express your preconceptions honestly. • Use focused reading to find evidence for or against your preconceptions. • Weigh the evidence, reevaluate your preconceptions. • When the literature fails to support or refute, it’s time to do your own experiment. Some parts of the current landscape (some of what’s hot) • Power-aware systems • Adoption of automation tools versus writing your own tools. • Balancing security and business objectives. • Integrated management of systems, knowledge, security, audit data. • Dealing with various (existing and new) forms of spam. • (and many others). Power-aware systems • No paper at LISA as yet. • Two important posters at HotPower 2008: • Srikantaiah, Kansal, and Zhao, Energy Aware Consolidation for Cloud Computing. • Lu and Varman, Workload Decomposition for Power Efficient Storage Systems, • Focused reading: – What is the problem? – What are the challenges? – How could this apply to system administration? Adoption of automation tools • This is a hard one. • Let’s go digging: – Mentioned in my LISA 2005 talk “What is this thing called configuration management?”. – Lots of hallway conversations. – Lots of very indirect evidence. – Evidence scattered all over the universe, one sentence at a time. • I didn’t say this was always easy. Balancing security and business objectives • Very few writings, but very controversial. One example: • Beattie, Arnold, Cowan, Wagle, Wright, and Shostack, Timing the application of security patches for optimal uptime, Proc. LISA 2002. • Focused reading: – What questions remain? – Are there analogies with other “best practices”? Integrated management • Lots of references with scattered ideas. One example: • Wang, Verbowski, Dunagan, Chen, Wang, Yuan, and Zhang, STRIDER: A Black-box, State-based Approach to Change and Configuration Management and Support, Proc. LISA 2003. • Focused reading: – What is the problem? – How does their approach work? – Can it be applied to Linux? Spam • A huge number of references with different strategies. One example: • Singaraju and Kang, RepuScore: Collaborative Reputation Management Framework for Email Infrastructure, Proc. LISA 2007. • Focused reading: – What kind of spam does this prevent? – What requirements are there? – What limitations are there? And the votes are in! • Anomaly detection and correction • Networking and IT Infrastructure • Configuration management (3) • Databases and Information Storage (3) • Heterogeneity • IP telephony • Managing mobile and wireless computing (3) • Network and Information Security (3) • Remote administration • Scaling problems: large or high-volume (2) • User management • Virtualization (5) So, the next topic is rather obvious: • I happen to know “a bit” about virtualization: • Alva Couch, System administration thermodynamics, ;login: magazine, Oct 2008. Kinds of virtualization • Whole operating system (XEN, VMWare, etc). • I/O virtualization: virtualize access to files, devices, etc, but not the operating system. – Monica Lam • Virtualization of configuration management – (NSDI: “Shards” system) Requests? (Feel free to put me on the spot) Part IV Epilogue The Pat Humphries song upon which I patterned this presentation: "We're all living in a great big dipper. We're all washed by the very same rain. We are swimming in the stream together, Some in power and some in pain. We can worship this ground we walk on, Cherishing the dreams that lie deep inside. Loving spirits will live forever. We're all swimming to the other side.” But the last verse is most relevant “When we get there we'll discover All the gifts we've been given to share Have been with us since life's beginning And we never noticed they were there. We can balance at the brink of wisdom Never recognizing that we've arrived. Loving spirits will live together. We're all swimming to the other side.” Pat Humphries said, about “same rain”: “This did not just come out of me. This came from a lot of different people and different places, and I just happened to be here at the right time for it to flow through my pen, my tape recorder.” I would say the same thing about my own research. Washed by the Very Same Rain: System Administration Research The End Alva L. Couch Tufts University couch@cs.tufts.edu