Research in System Administration

advertisement
Washed by the Very Same Rain:
System Administration Research
Alva L. Couch
Tufts University
couch@cs.tufts.edu
Part I
What is research?
Who am I?
• Overseer of LISA: chair of steering
committee, board liaison (since 2005)
• 14 LISA papers since 1996 (+ 2 students
who submitted sole-author papers)
• 2 LISA “best paper” awards and 1 “best
student paper” award since 1996.
• 2003 SAGE Professional Service Award
(with Mark Burgess and Paul Anderson).
What is research?
• We all think we know, but
• popular accounts of the nature of research
are misleading, and
• remain misleading throughout recorded
history!
A popular misconception
• Einstein created the theory of relativity out of
thin air.
• No one else could have done it but Einstein.
• Not!
• Einstein’s work began in a context of what was
already known, and
• Several mathematicians (notably Minkowski)
were working in the same context and
concurrently trying to come up with their own
explanations!
A song and a woven cloth
• This presentation is structured like the folksong
“same rain” by American folk singer Pat
Humphries, which has been considered by
some as a paradigm for research and
exploration.
• It is a cloth woven from threads inspired by my
wife’s mentor Prof. Philip Morrison (of MIT)
and his TV series “The Ring of Truth”, which
discusses how scientists develop their ideas.
“We're all living in a
great big dipper…”
• All research occurs in a context, that
includes
– What work has been done before.
– What community is interested.
– What problems remain to be solved.
• Context is a moving target that can
change rapidly over time.
“We’re all washed by the
very same rain…”
• By definition, research doesn’t occur in a
vacuum.
• If you see something important, chances
are that a number of other people have
seen the same thing.
• Difference is whether you do something
about understanding what you see!
• Edison: 1% inspiration, 99% perspiration.
“We are swimming
in the stream together…”
• Research is not about working alone, but
rather about communicating ideas to a
community that is exploring similar
directions.
• Most important step is to identify your
community (or communities).
• “Who are you swimming with?”
“Some in power and
some in pain…”
•
•
•
•
•
Failure is a crucial part of research.
One’s hypothesis can be invalid.
Even after one has believed it for years.
Only by failing can one learn.
Only by being open to failure can one
become objective.
• I have more wrong ideas than right
ones!
The usual formula for
how to do research
•
•
•
•
•
•
•
•
Determine context of the problem.
Survey proposed solutions.
Determine new directions to explore.
Choose one direction to explore.
Develop a hypothesis about the direction.
Test that hypothesis.
Evaluate the results of the test.
Refine the hypothesis, and repeat.
Key elements of the formula
• Context: maintaining an idea of what you
know and don’t know about a problem.
• History: keeping track of what you learn
over time.
• Evidence: how what you see supports or
refutes what you might think.
• Conversation: the ability to explain what
you see to others.
An alternative formula
•
•
•
•
Get excited about something.
Commit to learning all that can be understood about it.
Choose some small part of it to understand better.
Write down your specific ideas about the nature of this
part. This is your “hypothesis”.
• Test your understanding with observation. This is your
“experiment”.
• Remain doubtful of unconvincing evidence, and
curious about contradictory evidence.
• Refine yourself and then repeat!
Research versus learning
• Too often, research is mischaracterized as
a discovery product, like finding a piece
of gold in a gold mine.
• Most research is instead a learning
process, where you learn something
new about something you already see.
• The gold is not what you see, but what
you learn.
Research redefined
• An active learning process…
• In which you explore what happens, and
learn from the world…
• In a continuing conversation with a
community of learning…
• In a changing and evolving context of
observed phenomena and human needs…
• In which one risks being wrong, but
learns and evolves from one’s mistakes.
The Ring of Truth
• My wife was the researcher for the TV
series “The Ring of Truth”, which
discusses the nature of science.
• Each show concentrates on some aspect
of the scientific method: Looking, Change,
Mapping, Clues, Atoms, and Doubt.
• Let’s map these ideas into system
administration terms!
Looking
• The ability to look at something familiar
and see something new.
• Burch and Cheswick,Tracing Anonymous
Packets to Their Approximate Source,
Proc. LISA 2000.
• A denial of service (DoS) attack is not
always a bad thing, and one can use a
structured DoS to identify perpetrators of
other DoS’s!
Change
• The ability to embrace the idea that one’s
understanding of the world – and the world –
changes and improves over time.
• Finke, Manage People, Not Userids, Proc. LISA
2005.
• A revisitation of the same author’s previous
paper on the subject, in which he explains how
his understanding and practice improved over
time and reversed some prior decisions.
Mapping
• The ability to use models and
abstraction to understand the world.
• Couch, Wu, and Susanto, Toward a cost
model for system administration, Proc.
LISA 2005.
• A model of cost for helpdesks shows
through simulation that helpdesks running
near the limit of staff capacity experience
chaotic changes in total value.
Clues
• The ability to look for and see clues toward new
and different explanations of phenomena.
• Gross and Rosson, Looking for Trouble:
Understanding End-User Security Management,
Proc. CHIMIT 2007.
• The windows firewall message “do you want to
allow this connection” is semantically equivalent
– in the minds of most users – with “do you want
to get your work done or not?”
Atoms
• The ability to come to grips with what is
knowable and what is unknowable.
• Burgess, Computer Immunology, Proc.
LISA 1998.
• Centralized control systems depend upon
“knowing the unknowable,” whereas
physical systems such as the human body
depend upon distributed and “more
knowable” notions.
Doubt
• The ability to face and embrace one’s lack of
understanding of complex phenomena.
• Evard, An Analysis of UNIX System
Configuration, Proc. LISA 1997.
• Configuration management is often
conceptualized as a simple choice between
tools, but involves a more complex conflict
between technical methods and human needs.
Part II
Steps toward engaging in
research
Parts of becoming a researcher
• Engaging in active learning.
• Being open to doubt.
• Finding and maintaining context.
Aids to effective learning
• Keeping a personal journal of ideas,
directions, hypotheses, experiments,
conclusions, references.
• Breadth: documenting every idea you get.
• Depth: exploring one new direction at a time.
• Documenting each hypothesis and the
evidence for and against it as soon as
possible.
Persistence of memory?
• Don’t rely on your memory, no matter how
good it is.
• Your understanding of the problem is a
moving target.
• To teach other people what you learned,
you need to recall what you didn’t know
before!
Example: my journal
• Dated entries describe hypotheses, tests,
results, ideas.
• In electronic form (plaintext).
• Ideas often turn out to be wrong.
• I never delete or edit an entry!
• This is not a publication; it is a starting
point for one.
• It is more important to have a record
than to be correct.
Being open to doubt
• Doing research is about accepting that
absolutely any idea you write down is
– subject to continual validation and
– can turn out to be invalid at any time in the
future.
• Each entry in the journal is a starting
point for discussion, and not a fact.
• In mine, the “invalidated” entries
outnumber the “validated” ones.
Finding context and community
• Several resources can aid you in beginning:
– The Anderson taxonomy of system administration
topics. Anderson and Patterson, “A Retrospective on
Twelve Years of LISA Proceedings”, Proc. LISA 1999.
– Book: Selected Papers in Network and System
Administration (based upon the Anderson Taxonomy).
– Book: Handbook of Network and System
Administration (beyond the Anderson taxonomy).
– USENIX compendium of best papers (a testament to
the “most interesting” topics and approaches).
• Google can help, but only if you already know
the proper keywords!
Just as important: find community
• Your community: the people in this room.
• One often chooses a problem “for a
community” rather than the other way
around.
Essential skills of the researcher
•
•
•
•
Focused reading
Documenting biases.
Collecting evidence.
Being open to surprises.
Focused reading
• A researcher doesn’t read a paper like a
regular person.
• Reading occurs in a context.
• To answer specific questions.
The typical questions
• Relevance: is this work relevant to what I
want to understand?
• Context: where did their understanding
start (when their work began)?
• Results: where did their understanding
end (when they finished this paper)?
• Doubt: what unknowns did they find?
Questions evolve!
• These are just a starting point.
• As you focus upon a topic, reading
becomes more focused as well.
• E.g., “Is this relevant” becomes a question
about a specific kind of relevance.
Part III
Examples
…(Ahem)…
• The original idea for this talk was to describe the
whole “landscape” of system administration
research and where things are today.
• I thought about this a bit and decided that it was
too broad an objective.
• And it sounded a bit boring.
• So instead, I am going to show you several
examples of how to build your own landscape of
what’s important to you.
• And then, I’ll take requests!
How to build your own landscape
• Express your preconceptions honestly.
• Use focused reading to find evidence for
or against your preconceptions.
• Weigh the evidence, reevaluate your
preconceptions.
• When the literature fails to support or
refute, it’s time to do your own experiment.
Some parts of the current
landscape (some of what’s hot)
• Power-aware systems
• Adoption of automation tools versus writing your
own tools.
• Balancing security and business objectives.
• Integrated management of systems, knowledge,
security, audit data.
• Dealing with various (existing and new) forms of
spam.
• (and many others).
Power-aware systems
• No paper at LISA as yet.
• Two important posters at HotPower 2008:
• Srikantaiah, Kansal, and Zhao, Energy Aware
Consolidation for Cloud Computing.
• Lu and Varman, Workload Decomposition for
Power Efficient Storage Systems,
• Focused reading:
– What is the problem?
– What are the challenges?
– How could this apply to system administration?
Adoption of automation tools
• This is a hard one.
• Let’s go digging:
– Mentioned in my LISA 2005 talk “What is this
thing called configuration management?”.
– Lots of hallway conversations.
– Lots of very indirect evidence.
– Evidence scattered all over the universe, one
sentence at a time.
• I didn’t say this was always easy.
Balancing security
and business objectives
• Very few writings, but very controversial. One
example:
• Beattie, Arnold, Cowan, Wagle, Wright, and
Shostack, Timing the application of security
patches for optimal uptime, Proc. LISA 2002.
• Focused reading:
– What questions remain?
– Are there analogies with other “best practices”?
Integrated management
• Lots of references with scattered ideas. One
example:
• Wang, Verbowski, Dunagan, Chen, Wang, Yuan,
and Zhang, STRIDER: A Black-box, State-based
Approach to Change and Configuration
Management and Support, Proc. LISA 2003.
• Focused reading:
– What is the problem?
– How does their approach work?
– Can it be applied to Linux?
Spam
• A huge number of references with different
strategies. One example:
• Singaraju and Kang, RepuScore: Collaborative
Reputation Management Framework for Email
Infrastructure, Proc. LISA 2007.
• Focused reading:
– What kind of spam does this prevent?
– What requirements are there?
– What limitations are there?
And the votes are in!
• Anomaly detection and
correction
• Networking and IT
Infrastructure
• Configuration
management (3)
• Databases and
Information Storage (3)
• Heterogeneity
• IP telephony
• Managing mobile and
wireless computing (3)
• Network and Information
Security (3)
• Remote administration
• Scaling problems: large
or high-volume (2)
• User management
• Virtualization (5)
So, the next topic is rather obvious:
• I happen to know “a bit” about
virtualization:
• Alva Couch, System administration
thermodynamics, ;login: magazine, Oct
2008.
Kinds of virtualization
• Whole operating system (XEN, VMWare,
etc).
• I/O virtualization: virtualize access to files,
devices, etc, but not the operating system.
– Monica Lam
• Virtualization of configuration management
– (NSDI: “Shards” system)
Requests?
(Feel free to put me on the spot)
Part IV
Epilogue
The Pat Humphries song upon
which I patterned this presentation:
"We're all living in a great big dipper.
We're all washed by the very same rain.
We are swimming in the stream together,
Some in power and some in pain.
We can worship this ground we walk on,
Cherishing the dreams that lie deep inside.
Loving spirits will live forever.
We're all swimming to the other side.”
But the last verse is most relevant
“When we get there we'll discover
All the gifts we've been given to share
Have been with us since life's beginning
And we never noticed they were there.
We can balance at the brink of wisdom
Never recognizing that we've arrived.
Loving spirits will live together.
We're all swimming to the other side.”
Pat Humphries said,
about “same rain”:
“This did not just come out of me. This
came from a lot of different people and
different places, and I just happened to be
here at the right time for it to flow through
my pen, my tape recorder.”
I would say the same thing about my own
research.
Washed by the Very Same Rain:
System Administration Research
The End
Alva L. Couch
Tufts University
couch@cs.tufts.edu
Download