Database of Intentions
January 15, 2008
Discussion Leader: Professor Shivnath Babu
Scribe: Minerva Thai
Database of Intentions – introductory chapter of a book by John Batelle
author lived in Silicon Valley and headed an internet-based company
at the time internet economic interest (advertisement, e-commerce) was prevalent
stock market crashed in 2001. 2001 was also the year of the 9/11 incident; Batelle’s
company crashed too
discovered Google’s Zeitgeist and grew interested in its functions
*Discussion leader showed the website and asked for the definition of “zeitgeist”
definition – the general intellectual, moral and cultural climate of an era (on site)
leader clicked link to see 2007 Year-End graphs (matched social trends)
graphs showed months of searches and frequency of searches
ex. “Virginia Tech” term spiked in April due to the shooting
leader asked “What is required to generate the graphs?
searched terms are logged, time of search, and location of search
location determined by IP address
“rising” terms are those searched more frequently than in previous year
“falling” terms are those searched less frequently than in previous year
Zeitgeist ex. “most searched-for candidates” : how is the graph made?
input must be made to alert Google of which candidates are running
leader asked “What are the uses of Zeitgeist?”
marketing – companies can determine what’s popular to base ads on
ex. Britney Spears became more popular towards 2007’s end
keep in mind that the data will be past information
patterns – predictions may be made based on popularity patterns
ex. similar to noting earthquakes and their patterns of occurrence
additional comments on the Zeitgeist graphs are possible explanations of spikes
Batelle realized that Google/Zeitgeist was more technologically advanced than the Mac
decided to write a book about search
*Discussion leader asked “What’s the difference between the terms Internet and Web?”
Internet – connection of networks and machines globally
Web – data available on all machines, named for the connections of links/pages
*Student asked “Is there unlimited space on the internet?”
space is determined by the providing servers and the space on them
space denoted by kilobytes, megabytes, gigabytes, terabytes, and petabytes
2¹º bytes (1024), 2²º bytes, 2³º bytes, etc respectively
can run out of space but the cost is lowering for space so it is easier to expand
functions like zipping files (WINZip) are also contributing to more data
Batelle shows the sides of Google that most people do not think about
Database of Intentions (DBI)
“What is it? How does Google get it?”
database – stored area of information; is the DBI the same as the Zeitgeist?
refer to Ch. 1, page 6, paragraph 3, sentence 2 for definition
leader asked “what does ‘every path taken as a result’ mean?”
indicates the link that is clicked
“Is it specific to Google?”
No, any search site has a DBI
ex. Amazon (saves your purchases, browses, etc)
“How can Google use it?”
refer to points used in Ch. 1, page 2, paragraph 3, last few questions
leader directed class to questions in Ch. 1, page 12, opening paragraph
ex. Japanese teenagers – unanswerable; lack of age group discernment
guesses can be made based on histories of searches of teenagers
ex. Pop star’s selling – determinable through e-commerce sites
Amazon may track purchases and frequency
ex. Suburban moms – unanswerable; search does not mean answer
guesses can be made based on # of clicks and paths taken
signing into an account may add to Google’s information on a user
other media searches such as videos/images determine data as well
“How can Google ‘abuse’ it?”
ex. tracking terrorism by finding out which searches were done where
ex. user searching AIDS may experience an increase in insurance premium
Google might use the search query to inform the user’s health insurance
“What prevents Google from ‘abusing’ it?”
Electronic Communications Privacy Act protects email from other users’ eyes
Google “machine” can still read your emails (most apparent through ads)
ex. Leader’s friend’s wife was leaving for India and sent an email
an ad appeared on that same page about filing divorces
“What technological/social changes/advances make it possible to store/query DBI?”
cost of storage has decreased over the past 10 years
there are more users on the internet (higher bandwidth); more homes w/internet
services such as YouTube and Facebook have increased internet popularity
more users have trust in online companies and the internet
“Search as a problem is about 5% solved” – Udi Manber, CEO of
leader asked “Are you happy with how Google is now? What’s wrong with it?”
Relevance – harder due to spam sites & malicious internet servers
Skill – ideas must be broken up into specific keywords to generate results
ex. perfect article for a paper may not contain a searched term
Not human – unable to process certain concepts/ideas which a person can
ex. searching “jaguar” the animal results in Jaguar the car
Interesting terms
Alexandra, Boswell (constant companion and observer) - page 1
technology vs. media business – pages 3 & 4
Clickstream (which user clicked on which particular link)
Patriot Act – page 14
Turing test (AI test; a passing machine would seem human) – page 16
directory-based search – page 17
second generation web applications – pages 7, 11, 14