In your list of timeless principles that the fundamental, overriding principle

advertisement
FOCUS: ALGORITHMS
Excellence
in Search:
An Interview
with David
Chaiken
John Favaro, Intecs SpA, Italy
IN JUNE 2011, IEEE Software associate editor John Favaro interviewed
search engine giant Yahoo’s chief architect David Chaiken about algorithms
and today’s practitioner. Chaiken gave
a keynote speech at SATURN 2011 on
“Architecture at Internet Scale” that
stressed a set of timeless principles that
software engineers seemingly have to
relearn continuously.
John Favaro: Tell us a bit about your
own background. How and when did
you get into computer science?
David Chaiken: I’ve been hacking
for as long as I can remember. When
I was four years old, my parents sat
me down in front of a card punch
at RAND. My fi rst program was a
greatest-common-denominator solver,
which saved me calculation time on my
elementary school math homework. I
didn’t really get a formal introduction
to computer science until I studied at
MIT many years later.
You’re an architect now. Does that
mean you’re not a programmer
anymore?
I hope that’s not the case! All of
our architect job descriptions at Yahoo state, “We value architects who do
enough hands-on implementation work
to keep current with technology trends
inside and outside the company.” To be
realistic, I don’t have time to do significant programming as the chief architect of Yahoo, but I try to make time to
write some code every year.
84 I E E E S O F T W A R E | P U B L I S H E D B Y T H E I E E E C O M P U T E R S O C I E T Y
In your list of timeless principles that
software engineers learn and relearn,
the fundamental, overriding principle
“zero” is mastering complexity. But
right after that, you list a principle you
call “algorithms fi rst.” Could you explain a bit what you mean by that? You
seem to be giving algorithms a disproportionately high place in the priority
of your list of principles.
Formally speaking, mastering complexity requires a proof of the asymptotic computation, storage, and communication needs of a system. While
we don’t always do formal specifications and proofs of the properties of
our algorithms, the underlying behavior of the algorithms factors into our
capacity modeling—and therefore our
capital and operational expense planning—in a fundamental way.
For our larger systems, understanding asymptotic behavior is essential to
keeping our business running. In the
example that I presented at SATURN
2011, it was absolutely critical to know
the specific factors that caused exponential behavior in the common case
for one of our advertising systems.
Kevin Lang of Yahoo Labs wrote a
beautiful analysis called “Notes on
Tractability of NGD Ad Serving” that
had a profound effect on the way that
we rebuilt this system.
What is the role of algorithms in the
search industry? What are some of
the subareas in search where algorithms have been essential in their
development?
At Yahoo, we view search systems
first and foremost as vehicles for delivering the results of algorithms: discovery, content analysis, machine learning,
indexing, query analysis, and ranking.
Yahoo Labs delivers many of these algorithms, which are instantiated as the inner loops of tasks running on processing
pipelines, Hadoop grids, online serving
systems, and analytics warehouses.
0 74 0 -74 5 9 / 12 / $ 3 1. 0 0 © 2 0 12 I E E E
We believe that the greatest potential for innovation in search is in task
completion, which requires a new set
of algorithms. By task completion, we
mean delivering answers, not links, and
satisfying the underlying needs that
drive people to search. For example,
we need to understand that the query
“New York pizza” might mean pizza
restaurants in New York but—especially at meal time for people outside
the tri-state area searching on a mobile phone—it often means, “Where
is a nearby restaurant that serves New
York-style pizza?” Delivering on task
completion requires better query analysis, new content sources, and understanding the underlying semantics of
the Web of Objects. It also requires
shifting our optimization algorithms
from one-dimensional objective functions that determine link order to twodimensional objective functions that
drive page layout.
Tell us a bit about the place of algorithms at Yahoo. Is it true that there’s
an entire group of theoretical computer
scientists dedicated only to the elaboration and analysis of algorithms? How
do they interact with the “normal”
programmers—do the “normal” programmers encounter problems and then
throw them over the fence to the theoreticians? Or do they have a more proactive role?
Yahoo has the best track record of
moving algorithms from research to
products of any other company that I’ve
seen in my career. We have such a good
track record precisely because we intentionally break down the barriers between researchers and their colleagues
in product and operations groups. In
some cases, researchers write the code
that’s deployed to production. In many
other cases, researchers work directly
with applied scientists who have the
unique blend of talent and patience required to tune all of the dials required
DAVID CHAIKEN
David Chaiken is chief architect at Yahoo,
where he oversees the technical architecture of all the company’s consumer, advertising, and platform products. Chaiken
has been hacking since his parents sat
him down in front of an IBM card punch
more than 40 years ago. Over his career,
he has built voice search products for
consumers, mobile enterprise applications, network management systems,
project management software, a largescale multiprocessing system, and five
or so information appliances. His favorite
technologies include the RSA encryption
algorithm, the C programming language,
the ARM instruction set architecture, the
Fedora distribution of Linux, and the buildon-grid-push-to-serving design pattern.
Chaiken earned a PhD in electrical engineering and computer science from MIT.
to run an algorithm at scale—sometimes delivering results to hundreds of
millions of consumers.
Getting the research to the product
pipeline right is a team sport. I’ve been
in code reviews that included the researchers who wrote the original paper
on an algorithm, architects who wrote
the class structure and templates for a
product, applied scientists who tuned
and extended the algorithm, programmers who understood how to get the
best performance out of the runtime
environment, and operations engineers
who deployed the system at scale.
The awesome track record of Yahoo
Research may lead to the assumption
that our theoretical computer scientists
aren’t engaged in products. If they’re
publishing such high-quality work,
then they can’t possibly be engaged in
making money, right? It turns out that
this assumption isn’t true. To the con-
trary, Yahoo researchers publish great
papers because they’re fi rmly grounded
in the challenges of developing and deploying premier online media products.
The search industry seems to be a perfect example of an industry where algorithms are absolutely critical. What are
some other examples of industries that
come to your mind?
Two other facets of the digital media industry come instantly to mind.
Computational advertising, a discipline
led by Andrei Broder, uses algorithms
to attack the problem of delivering the
right ad to the right consumer at the
right time. That might be an unfair
answer, because Andrei and his team
are great at recasting online advertising problems as search problems. Content optimization, an area led by Raghu Ramakrishnan, applies machine
learning and optimization algorithms
J A N U A R Y/ F E B R U A R Y 2 0 1 2
| IEEE
S O F T W A R E 85
FOCUS: ALGORITHMS
ABOUT THE AUTHOR
JOHN FAVARO is a senior consultant at Intecs SpA in Pisa, Italy. His research
interests include efficient safety analysis of critical systems, real-time architectural patterns, and requirements engineering. Favaro has an MS in electrical
engineering and computer science from the University of California, Berkeley.
Contact him at john@favaro.net.
to personalizing webpages and other
digital media products for individual
consumers. Other industries include
database technology, graphics (from consumer cameras to blockbuster movies),
robotics, security, biotechnology, highenergy physics—my goodness, it’s hard
to stop. How long a list do you want?
I can see how a software engineer in a
company like Yahoo would need to be
up to speed in algorithms, but honestly,
why should the rest of us software engineers care about algorithms today? Isn’t
that all “under the hood” now, and we
don’t have to worry about it anymore
if we’re just programming up mundane
IT support tasks?
I defi nitely understand this view, because I was oblivious to algorithms for
my fi rst 18 years of interaction with
computers. At the ripe old age of 22, I
was working in data communications
at Motorola and realized that I was
missing some foundational knowledge
that limited my ability to progress from
what you might call mundane IT support tasks. I found myself (re)inventing
graph algorithms like depth-fi rst search
and breadth-fi rst search, and wondering which approach to use.
Put it this way: if you’re satisfied
with a career of programming mundane
IT support tasks, feel free to ignore algorithms. When you’re ready to take a
step up, let your curiosity lead you to
study the domains that you need to succeed. It’s almost guaranteed that you’ll
fi nd some domain of algorithms that
are relevant to what you want to do.
How do you see the state of teaching in
algorithms in universities now? Are students still getting an adequate grounding like they did 30 to 40 years ago? Or
are universities slipping? What do you
think an ideal curriculum in algorithms
would look like?
The population of programmers and
other information technology professionals has changed radically in size,
demographics, interests, and specializations over the last 30 to 40 years. Students have also changed: they demand
to be entertained in addition to being
taught. However, the fundamentals
still seem the same: start with a semester course that surveys different classes
of algorithms and trains students in
complexity and performance analysis.
Then, teach the relevant algorithms in
each of the domain-specific topics that
students choose to take.
Of course, having Ron Rivest as a
professor, with one or two guest lectures from Tom Cormen, [two of the
coauthors of the classic Big Book of
Algorithms —eds.] probably makes me
biased toward the high end of value
that I put in a graduate-level course in
algorithms.
What sources do you consider to be
the best today for software engineers
to learn about algorithms and to stay
abreast of things?
86 I E E E S O F T W A R E | W W W. C O M P U T E R . O R G / S O F T W A R E
Isn’t that why we invented the Internet and the Web in the fi rst place?
My understanding is that Bob Taylor wanted his ARPA-funded research
groups to work with each other; Tim
Berners-Lee wrote HTTP to allow
physicists to stay current with each
others’ work; and Marc Andreessen
created a better way to browse the research content. Jerry Yang, David Filo,
and the rest of the gang figured out
how to make the technology useful for
the rest of the world, but don’t forget
that the online world was originally by
the nerds, for the nerds!
In addition to using the Web to find
what I need, what I still read cover to
cover is Communications of the ACM,
which has a mix of academic- and industry-focused content that satisfies
some of the need to stay current. Every Yahoo employee has access to the
ACM and IEEE digital libraries plus Safari Books Online, which provide great
technical references.
Sometimes, it’s important to go back
to our foundational texts. I just pulled
out the CLR book yesterday to do some
analysis of IPv6 to IPv4 address hashing, which came up when Yahoo was
planning for World IPv6 Day.
FIN D US ON
FAC E B O O K
& T WI T T E R !
facebook.co m/
ieees oftware
twitter.co m/
ieees oftware
This article was featured in
For access to more content from the IEEE Computer Society,
see computingnow.computer.org.
Top articles, podcasts, and more.
computingnow.computer.org
Download