09
 Ferret:
360 
Search
 Prafulla
Mahindrakar


advertisement

Spring
Ferret:
360o
Search
Prafulla
Mahindrakar
Aniket
Patil
Ketan
Umare
Advisor:
Dr.
Ling
Liu
CS8803:Advanced
Internet
Application
Development,
Group
Project.
09
2
FERRET:
360O
SEARCH
Table
of
Contents
1.
MOTIVATION
AND
OBJECTIVES
4
2.
RELATED
WORK
5
2.1
GOOGLING
2.2
SOCIALLY
RELEVANT
SEARCH
2.2
CATEGORIZATION
OF
SEARCH
RESULTS
5
6
6
3.
ARCHITECTURE
7
3.1
SYSTEM
ARCHITECTURE
DIAGRAM
3.2
PATTERN
ORIENTED
ARCHITECTURE
3.2.1
DESIGN
PATTERNS
3.2.2
DESIGN
PATTERNS
USED
IN
FERRET
3.3
HIGH
PERFORMANCE
3.3.1
THREADING
3.3.2
CACHING
3.4
DATABASE
SCHEMA
3.4.1
ER
DIAGRAM
3.4.2
USER
TABLE
3.4.3
PAGE
KEYWORD
TABLE
3.4.4
USER
SESSION
TABLE
7
8
8
8
9
9
9
9
9
10
10
10
4.
COMPONENTS
11
4.1
AUTHENTICATION
4.2
STANFORD
TAGGER
4.2
WEB
SEARCH
4.3
MEDIA
SEARCH
4.4
PRODUCT
SEARCH
4.4.1
CLUSTERING
OF
RESULTS
4.5
SOCIAL
SEARCH
4.5.1
SESSIONS
4.5.2
LISTEN
TO
USER
CLICKS
4.5.3
HEARTBEAT
MESSAGES
4.6
CATEGORIZATION
ENGINE
4.6.1
WHY
DOCUMENT
CLUSTERING
4.6.2
APPROACHES
4.6.3
BUILDING
BLOCKS
4.6.4
LINGO
ALGORITHM
4.7
PRESENTATION
ENGINE
4.7.1
SEARCH
RESULTS
TAB
CREATOR
4.8
VIEW
11
11
11
11
11
12
12
13
13
13
14
14
14
15
16
18
18
18
5.
EVALUATION
FRAMEWORK
19
2
FERRET:
360O
SEARCH
FERRET:
360O
SEARCH
3
5.1
SOCIAL
NETWORK
SIMULATION
5.1.1
ER
DIAGRAM
5.2
JMETER
5.2.1
TEST
CASES
5.3
COMPARISON
TO
OTHER
SEARCH
ENGINES
19
19
20
20
21
6.
TESTING
AND
RESULTS
22
6.1
PROTOTYPE
SYSTEM
6.1.1
SOFTWARE
6.1.2
HARDWARE
6.1.3
OPERATING
SYSTEM
6.2
RESULTS
6.2.1
WITHOUT
MEMCACHED
6.2.2
WITH
MEMCACHED
6.2.3
LOAD
RESPONSE
WITH
MEMCACHED
22
22
22
22
22
23
23
24
7.
FUTURE
WORK
25
8.
CONCLUSIONS
25
9.
BIBLIOGRAPHY
25
9.
APPENDIX
27
FERRET:
360O
SEARCH
3
4
FERRET:
360O
SEARCH
1.
Motivation
and
Objectives
fer·ret
(v)
(\ˈfer­әt\)
to
find
and
bring
to
light
by
searching
Imagine
trying
to
find
a
pair
of
the
latest
Ray‐Ban
glasses
in
the
Lenox
Square
Mall.
It
is
not
an
easy
task!
Now
think
about
doing
the
same
across
the
World
Wide
Web.
Feeling
tizzy?
The
World
Wide
Web
with
its
astronomical
amount
of
information
presents
an
enormous
challenge
for
resource
discovery.
Precise
navigation
is
impossible
with
the
increasingly
large
collection
of
hyperlinks
that
users
must
traverse.
Commercial
search
engines
like
Google
and
Yahoo
have
solved
the
problem
at
a
fundamental
level
by
making
available
a
hypertext‐based
index
for
pages
across
the
web.
Web
Users
can
query
the
index
for
documents
about
a
specific
topic
to
find
the
desired
document.
While
search
engines
have
become
quite
popular
and
are
helping
to
redefine
how
people
access
information
scattered
across
the
wide‐area
network,
they
are
not
well
suited
to
the
case
when
users
do
not
know
what
exactly
they
are
looking
for.
In
such
a
situation,
using
one
of
the
popular
search
engines
can
be
a
messy,
frustrating
experience.
What
do
you
do
when
you
don’t
know
where
to
start?
Give
Ferret
a
try!
For
any
topic
in
the
universe,
Ferret
provides
a
neatly
organized
view
of
the
web.
Our
category
guides
bring
meaningful
and
relevant
information
that
makes
browsing
for
a
topic
fast.
Rather
than
the
messy
back‐and‐forth
clicking
of
search
results,
we
do
the
processing
so
that
you
can
learn,
explore
and
discover
the
things
that
matter
to
you.
Ferret
offers
you
a
new
way
to
discover
the
Web
–
it’s
the
place
you
should
be
when
you
want
to
browse
and
discover
everything
the
Web
has
to
offer.
Come
to
Ferret
when
you
want
to
learn
about
a
topic
or
explore
what’s
happening
now
on
the
Web.
We’ll
show
you
content
that
you
may
have
never
discovered
otherwise
and
we’ll
give
you
an
at‐an‐glance
look
at
everything
related
to
the
query.
Think
of
Ferret
as
your
guide
for
exploring
the
Web.
For
instance,
consider
the
search
term
‘Transformers’.
A
Google
search
result
returns
a
list
arranged
serially
that
speaks
about
the
movie
‘Transformers’,
and
electrical
transformers
on
the
first
page.
However,
a
user
who
is
interested
in
knowing
about
the
class
‘Transformer’
in
Java
or
about
the
comics
on
Transformers
needs
to
browse
several
pages
before
such
results
are
discovered.
Our
system
graphically
arranges
and
classifies
results
into
categories
such
as
text,
multimedia,
entertainment,
discussions,
blogs
and
more.
A
user
simply
needs
a
single
click
to
have
a
360
degree
view
of
content
associated
with
the
query
term.
4
FERRET:
360O
SEARCH
FERRET:
360O
SEARCH
5
Ferret.
Your
guide
to
the
world!
2.
Related
Work
Search
has
been
a
constantly
evolving
and
a
continuously
researched
topic.
There
have
been
great
success
stories
and
even
greater
debacles
in
this
industry.
Web
search
has
become
such
an
important
part
of
our
life
that
it
has
contributed
to
our
vocabulary
in
some
cases.
Following
are
some
of
the
most
different
systems
currently
available
online,
from
which
we
derive
and
drive
our
inspiration.
Figure
1:
Taxonomy
of
Existing
Search
Technologies
2.1
Googling
In
their
seminal
work
[1],
the
authors
described
a
new
way
of
ranking
web
documents,
based
on
the
idea
of
citation.
The
Search
engine
instantly
became
a
hit
and
overtook
all
of
its
competitors.
The
webpage
[2]
is
the
most
highly
visited
page
online
and
everyone
knows
“The
Google
Story”.
Google
uses
a
simple
keyword
based
search,
but
the
most
important
point
is
the
ranking
of
content.
Thus
Google
successfully
demonstrates
the
idea
that
just
the
content
is
not
important,
but
the
way
we
present
it
is
highly
important.
Google
has
continued
to
innovate
and
come
up
with
great
innovative
new
features,
but
still
it
has
a
long
way
to
go.
FERRET:
360O
SEARCH
5
6
FERRET:
360O
SEARCH
2.2
Socially
relevant
search
Social
search
or
a
social
search
engine
is
a
type
of
web
search
method
that
determines
the
relevance
of
search
results
by
considering
the
interactions
or
contributions
of
users.[3]
Based
on
this
simple
idea
is
Delver[4],
which
uses
the
social
network
of
a
user
to
come
up
with
better
recommendations.
It
enables
you
to
find,
experience
and
benefit
from
the
wealth
of
information
created
and
referenced
by
your
social
world.
Socially
relevant
search
can
really
benefit
a
user,
as
what
matters
to
him
is
usually
what
matters
to
his
peers.
Paper
[5]
talks
about
the
benefits
of
integrating
the
web
search
and
social
search
and
quantifies
it
with
great
results.
It
also
delineates
the
challenges
in
doing
so.
2.2
Categorization
of
search
results
Search
results
categorization
is
another
important
way
to
present
the
search
results.
Take
an
example
of
the
word
Transformers.
For
the
same
word
we
could
have
different
implications
–
an
electrical
device,
a
movie,
the
cartoon
series,
a
toy,
there
could
be
a
review
about
the
movie,
or
some
news
about
the
invention
of
some
new
efficient
transformer,
etc.
So
how
do
you
show
these
results?
Which
is
more
important?
These
questions
are
almost
impossible
to
answer.
Papers[6‐9]
show
a
variety
of
ways
in
which
we
can
classify
the
web
search
results
and
quantify
them
with
interesting
results.
But
Kosmix[10],
is
one
of
the
most
promising
sites
that
has
leveraged
from
this
idea.
It
uses
the
search
provided
by
Google,
and
creates
a
wrapper
for
its
own
classification
system.
It
has
been
voted
as
one
of
the
best
new
startups[11]
and
that
just
makes
a
statement
about
the
importance
of
classification
of
results.
6
FERRET:
360O
SEARCH
FERRET:
360O
SEARCH
7
3.
Architecture
The
following
sections
give
an
outline
of
the
System
architecture
and
a
small
description
of
the
important
components.
3.1
System
Architecture
Diagram
Figure
2
System
Architecture
Ferret
can
operate
in
two
modes,
Logged
in
or
Private
mode.
Each
of
these
modes
are
described
in
detail
in
the
later
sections.
In
the
logged
in
mode
alongwith
the
typical
web
results,
ferret
also
provides
socially
relevant
search
results,
using
the
FERRET:
360O
SEARCH
7
8
FERRET:
360O
SEARCH
users
profile
form
one
of
the
major
social
network
databases,
for
example
facebook
OR
twitter.
The
typical
web
results
are
categorized
into
3
broad
categories,
namely,
Web
Search,
Media
Search
and
Product
Search.
Each
Category
is
further
categorized
using
our
clustering
algorithm.
3.2
Pattern
Oriented
Architecture
The
aim
while
developing
ferret
was
to
keep
it
flexible
enough
so
that
we
can
add
new
features
with
relative
ease.
Also
performance
was
a
major
concern,
so
each
of
the
components
built
was
built
for
a
large‐scale
system.
This
could
be
easily
achieved
using
Pattern
oriented
architecture.
The
following
section
describe
the
various
patterns
used
in
ferret.
3.2.1
Design
Patterns
A
Design
Pattern
can
be
defined
as
a
particular
recurring
design
problem
that
arises
in
specific
design
contexts,
and
presents
a
well­proven
generic
scheme
for
its
solution.
Describing
its
constituent
components,
their
responsibilities
and
relationships,
and
the
ways
in
which
they
collaborate
specifies
the
solution
scheme[12,
13].
3.2.2
Design
Patterns
used
in
Ferret
Ferret
uses
these
design
patterns.
3.2.2.1
Front
Controller
A
Front
controller
pattern
enables
centralized
request
processing.
This
enables
changes
to
the
levels
below
to
be
transparent.
Even
communication,
threading
can
be
abstracted
easily
from
the
presentation
layer.
3.2.2.2
Abstract
Factories
Factories
is
a
creational
pattern
that
abstracts
creation
of
objects
from
the
place
where
it
is
used.
This
provides
ease
of
adding
modules.
3.2.2.3
Strategy
A
strategy
pattern
allows
ferret
to
change
clustering
algorithms
easily
and
thus
allowing
new
algorithms
to
be
plugged
in
with
relative
ease.
This
especially
was
vital
during
testing
out
various
algorithms.
3.2.2.4
Adapter
Adapter
pattern
is
used
to
abstract
the
search/fetch/cluster
logic
from
the
presentation
generator.
This
generator
can
also
be
modified
easily
irrespective
of
changes
to
the
prior
system.
3.2.2.5
Singleton
Many
things
needs
single
connections
and
to
avoid
the
overhead
we
used
thread
controllers
in
singletons
so
that
we
could
reduce
the
thread
creation
overhead.
Also
tagger
library
is
loaded
just
once
so
that
we
avoid
the
cost
associated
with
re
reading
it.
8
FERRET:
360O
SEARCH
FERRET:
360O
SEARCH
9
3.2.2.6
Spring
(OS)
Doors
Just
as
the
Spring
system
developed
at
Sun
labs
we
have
Controller
which
abstracts
the
access
of
data
from
the
presentation
layer.
This
allows
us
to
deploy
individual
systems
remotely,
which
could
be
employed
in
the
future
for
large
scale
distributed
computing.
3.3
High
Performance
For
a
search
engine
Performance
is
critical.
Ferret
achieves
performance
using
large
scale
threading,
distributed
caching
and
easily
allowing
separation
of
modules
onto
separate
physical
hosts.
3.3.1
Threading
Ferret
uses
pre‐spawned
thread
pools
to
offset
the
overhead
of
thread
spawning.
It
also
uses
threads
to
perform
searches
across
various
domains
parallely.
3.3.2
Caching
Ferret
uses
memcached[14,
15]
to
cache
recent
results.
To
maintain
freshness
of
the
results,
each
cached
entry
is
associated
with
Expiry
value.
Currently
the
expiry
time
is
arbitrarily
fixed,
but
future
efforts
would
aim
at
arriving
at
this
number
using
a
learning
algorithm.
For
example,
it
is
known
that
google
doesnot
refresh
its
index
for
atleast
n
hours.
In
that
case
we
could
cache
till
the
results
are
updated.
3.4
Database
Schema
The
database
used
by
ferret
is
minimal,
and
this
is
essential
to
enhance
the
performance.
The
following
section
describes
the
schema
in
detail.
3.4.1
ER
Diagram
Figure
3
Database
Model
for
Social
Search
Table
usr_user:
Column
Name
Uid
Description
Auto‐generated
primary
key
for
usr_user
FERRET:
360O
SEARCH
9
10
FERRET:
360O
SEARCH
Username
Password
Name
Profession
ImageUrl
table
Login
name
for
the
user
User’s
password
User’s
name
User’s
profession
Pathname
for
the
user
image
Table
puk_pagekeyword:
Column
Name
pageid
page
keyword
title
Description
Auto‐generated
primary
key
for
Puk_pagekeyword
table
URL
of
the
page
Processed
query
term
for
which
page
was
retrieved
Title
for
page
Table
uss_usersession:
Column
Name
uid
pageid
historycount
timestamp
sessionid
Description
Auto‐generated
primary
key
for
uss_usersession
table
Refers
to
puk_pagekeyword.pageid
Frequency
of
usage
of
search
results
Time
at
which
user
selected
a
page
for
reading
Server‐generated
session
id
for
user
3.4.2
User
Table
The
user
table
is
needed
to
maintain
login
information
of
the
user
in
case
the
Google
Authentication
system
isn’t
used.
Also
it
stores
the
uid’s
which
again
could
be
directly
from
facebook,
bt
would
be
needed
in
case
of
multiple
networks.
3.4.3
Page
Keyword
Table
This
table
is
used
to
maintain
a
list
of
popular
keyword
and
page
combinations
accessed
by
the
users.
Based
on
freshness
criteria,
this
table
should
be
cleaned
every
x
number
of
days.
3.4.4
User
Session
table
This
table
is
used
to
track
the
user
and
his
favorite
links.
This
table
is
essential
to
implement
the
Good
page
Bad
page
algorithm.
10
FERRET:
360O
SEARCH
FERRET:
360O
SEARCH
11
4.
Components
This
section
explains
the
various
modules
that
constitute
the
ferret
search
engine.
4.1
Authentication
Ferret
uses
its
own
database
to
authenticate
the
user.
It
is
easy
to
instead
use
the
Google
OpenId
system
for
authentication.
The
system
treats
the
user
as
a
guest
and
does
not
track
your
activities.
This
enables
private
browsing.
4.2
Stanford
Tagger
The
Stanford
Tagger
used
by
Ferret
is
a
Part‐Of‐Speech
Tagger.
It
is
a
piece
of
software
that
reads
text
in
some
language
and
assigns
parts
of
speech
to
each
word
(and
other
tokens),
such
as
noun,
verb,
adjective,
etc.
The
tagger
is
used
to
identify
relevant
keywords
in
a
query
and
store
them
in
the
database.
The
tagger
is
used
in
the
following
components:
• Dictionary
Search:
The
tagger
identifies
nouns
(personal,
common,
both
singular
and
plural)
as
keywords
to
be
sent
to
WordNet
for
query.
• Social
Search:
The
tagger
identifies
nouns
(personal,
common,
both
singular
and
plural),
verbs
and
adverbs
from
the
user’s
query.
4.2
Web
Search
This
engine
is
multithreaded
and
accepts
the
raw
query
and
dispatches
it
to
the
various
worker
threads,
which
aim
at
collecting
the
search
results
from
variety
of
search
engines
like
Google[2],
A9[16],
IMDB[17]
etc.
The
worker
threads
use
WSDL
to
communicate
to
the
various
search
engines.
The
external
interface
is
extensible
since
collecting
results
from
a
new
search
engine
simply
requires
the
implementation
of
a
WSDL
interface.
This
enables
our
system
to
be
augmented
by
additional
search
results
through
Yahoo,
Windows
Live
or
any
other
search
engine.
4.3
Media
Search
This
engine
is
multithreaded
and
accepts
the
raw
query
and
dispatches
it
to
the
various
worker
threads,
which
aim
at
collecting
the
search
results
from
variety
of
search
engines
like
Google[2],
A9[16],
IMDB[17]
etc.
The
worker
threads
use
WSDL
to
communicate
to
the
various
search
engines.
The
external
interface
is
extensible
since
collecting
results
from
a
new
search
engine
simply
requires
the
implementation
of
a
WSDL
interface.
This
enables
our
system
to
be
augmented
by
additional
search
results
through
Yahoo,
Windows
Live
or
any
other
search
engine.
4.4
Product
Search
Ferret
product
search
uses
Amazon
E‐Commerce
API
to
retrieve
product
information.
The
API
exposes
Amazon's
product
data
and
e‐commerce
functionality.
This
allows
Ferret
to
leverage
the
data
that
Amazon
uses
to
power
its
own
business.
FERRET:
360O
SEARCH
11
12
FERRET:
360O
SEARCH
Ferret
is
able
to
retrieve
product
results
over
a
huge
range
of
categories.
For
every
product,
Ferret
retrieves
the
product
name,
product
cost
as
on
Amazon
and
a
product
image.
All
searches
are
performed
for
US
locale.
In
the
future,
it
may
be
possible
to
detect
the
geographical
region
from
where
the
query
originates
and
adjust
the
locale
accordingly.
4.4.1
Clustering
of
Results
The
search
results
are
clustered
dynamically
on
the
basis
of
categories
that
are
retrieved
for
the
query
term.
All
products
belonging
to
a
single
category
are
arranged
together
using
seed
list
based
clustering.
Any
Amazon
product
can
be
classified
into
one
of
the
following
categories:
Apparel,
Automotive,
Baby,
Beauty,
Blended,
Books,
Classical,
Digital
Music,
DVD,
Electronics,
Foreign
Books,
Gourmet
Food,
Health
Personal
Care,
Hobbies,
Home
Garden,
Jewelry,
Kitchen,
Magazines,
Merchants,
Miscellaneous,
Music,
Musical
Instruments,
Music
Tracks,
Office
Products,
Outdoor
Living,
PC
Hardware,
Pet
Supplies,
Photo,
Restaurants,
Software,
Software
Video
Games,
Sporting
Goods,
Tools,
Toys,
VHS,
Video,
Video
Games,
Wireless,
Wireless
Accessories
Figure
4
List
of
product
categories
in
Amazon
We
used
the
above
categories
as
a
seed
list
and
use
the
retrieved
product
information
to
detect
the
category
and
cluster
appropriately.
Due
to
the
extensible
nature
of
the
product
search
component,
we
can
easily
obtain
results
from
other
e‐commerce
providers
such
as
Buy.com
and
Ebay.
We
also
plan
to
integrate
functionality
to
sort
results
by
cost
and
social
relevance.
4.5
Social
Search
Ferret
adds
a
new
spin
to
search:
social
networking.
One
of
the
most
innovative
features
of
Ferret
is
the
ability
to
retrieve
search
results
that
are
relevant
to
the
user’s
social
network.
The
feature
allows
the
user
to
leverage
searches
performed
by
the
user’s
friends.
Social
search
recommends
the
best
pages
found
by
people
in
the
user’s
network
that
are
relevant
to
the
user’s
query.
Ferret’s
social
search
tries
to
match
the
user’s
query
term
with
a
larger
set
of
searchers
in
the
user’s
social
network
that
are
looking
for
the
same
things.
The
results
are
clustered
by
the
friend’s
name
and
are
listed
serially.
Each
result
contains
the
page
name
and
the
page
url
which
is
clickable
for
the
purpose
of
viewing.
The
feature
is
an
opt‐in:
no
one
can
see
what
the
user
is
searching
for
unless
the
user
logs
in.
This
ensures
user
privacy.
12
FERRET:
360O
SEARCH
FERRET:
360O
SEARCH
13
Social
Search
is
implemented
using
the
following
primitives:
• Sessions
• Listen
to
user
clicks
• Heartbeat
Messages
4.5.1
Sessions
In
Ferret,
a
session
stores
the
state
of
communication
between
a
server
and
the
user
enabling
the
server
to
identify
that
user
across
multiple
page
requests
or
visits
to
that
site.
A
session
is
created
when
a
user
logs
in
with
his
username
and
password.
The
session
for
a
user
stores
the
following
attributes:
• User
ID:
The
primary
key
generated
by
the
database
for
the
user
• Query:
The
query
term
currently
being
searched
by
the
user
• Page
URL:
The
page
currently
being
viewed
by
the
user
• Timestamp:
The
time
at
which
the
user
clicked
on
the
page
url
4.5.2
Listen
to
user
clicks
A
request
is
sent
to
the
server
each
time
the
user
clicks
on
a
page
for
browsing.
This
is
used
to
store
associate
the
query
term
processed
by
Stanford
Tagger
(keywords)
with
the
user
id
previously
stored
in
the
session
for
future
processing.
The
algorithm
for
the
user
clicks
is
as
follows:
1. If
session
is
invalid
2. Return
3. Else
if
no
timestamp
exists
in
session
4. Insert
URL
into
session
5. Insert
Keyword
into
session
6. Insert
Timestamp
into
session
7. Return
Figure
5
User­Click
Algorithm
4.5.3
Heartbeat
Messages
A
heartbeat
message
is
an
event‐driven
message
which
is
sent
to
the
server
when
there
is
a
search
results
page
is
reloaded.
This
message
is
used
to
detect
if
the
user
likes
the
page
he
has
just
viewed.
We
use
heuristics
to
differentiate
such
a
page
from
one
the
user
does
not
like.
The
heuristic
Ferret
uses
is
as
follows:
If
the
user
spends
more
time
on
a
certain
page,
we
can
assume
he
does
so
because
he
likes
the
page.
If
the
user
returns
back
from
a
page
“quickly”,
he
does
not
like
the
page.
Currently,
we
have
set
a
timeout
of
30
seconds
to
differentiate
a
good
page
from
a
bad
page.
If
the
user
spends
30
seconds
or
greater
on
a
specific
page,
the
system
records
the
page
as
a
good
page
FERRET:
360O
SEARCH
13
14
FERRET:
360O
SEARCH
and
stores
it
in
the
database.
If
the
user
returns
from
the
page
in
less
than
30
seconds
the
page
is
not
associated
with
the
user.
The
algorithm
for
the
heartbeat
process
can
be
summarized
as
follows:
1. If
session
is
invalid
2. Return
3. Else
if
no
timestamp
exists
in
session
4. Return
5. Else
if
page
is
liked
by
user
6. Associate
page‐keyword
combination
with
userid
7. Return
8. Else
if
page
is
disliked
by
user
9. Remove
page‐keyword
association
with
user
10. Return
Figure
6
Heartbeat
Algorithm
In
the
future,
social
search
can
be
improved
by
deducing
the
“meaning”
of
the
query
being
searched
using
natural
language
processing
query
techniques
and
using
the
meaning
to
retrieve
search
results.
For
instance,
if
the
user
is
searching
for
“what
drug
treats
a
headache”
Ferret
can
process
the
semantic
relationships
between
words
and
may
deduce
that
someone
searching
for
“what
medicine
relieves
migraines”
is
a
match.
In
addition,
it
may
be
possible
to
rank
a
set
of
results
retrieved
for
a
specific
user’s
friend
by
freshness
or
relevance
to
the
query.
4.6
Categorization
Engine
The
results
collected
through
the
various
websites
are
then
categorized
using
Lingo
clustering,
and
then
grouped
into
different
categories.
4.6.1
Why
Document
Clustering
With
an
enormous
growth
of
the
Internet
it
has
become
very
difficult
for
the
users
to
find
relevant
documents.
In
response
to
the
user’s
query,
currently
available
search
engines
return
a
ranked
list
of
documents
along
with
their
partial
content
(snippets).
If
the
query
is
general,
it
is
extremely
difficult
to
identify
the
specific
document
which
the
user
is
interested
in.
The
users
are
forced
to
sift
through
a
long
list
of
off‐topic
documents.
Moreover,
internal
relationships
among
the
documents
in
the
search
result
are
rarely
presented
and
are
left
for
the
user.
One
approach
is
to
automatically
group
search
results
into
thematic
groups
(clusters)
which
would
help
the
user
to
see
various
perspective
of
the
same
query
grouped
into
categories.
4.6.2
Approaches
Clustering
of
web
search
results
was
first
introduced
in
the
Scatter‐Gather
system.
Several
algorithms
followed;
Suffix
Tree
Clustering,
(STC),
implemented
in
the
Grouper
system
pioneered
in
using
recurring
phrases
as
the
basis
for
deriving
14
FERRET:
360O
SEARCH
FERRET:
360O
SEARCH
15
conclusions
about
similarity
of
documents.
MSEEC
and
SHOC
also
made
explicit
use
of
words
proximity
in
the
input
documents.
Apart
from
phrases,
graph‐partitioning
methods
have
been
used
in
clustering
search
results
All
the
above
approaches
follow
a
scheme
where
cluster
content
discovery
is
performed
first,
and
then,
based
on
the
content,
the
labels
are
determined.
But
very
often
intricate
measures
of
similarity
among
documents
do
not
correspond
well
with
plain
human
understanding
of
what
a
cluster’s
“glue”
element
has
been.
To
avoid
such
problems
Lingo
algorithm
reverses
this
process
and
attempt
to
ensure
that
it
can
create
a
human‐perceivable
cluster
label
and
only
then
assign
documents
to
it.
This
the
approach
we
have
followed
in
our
implementation
of
clustering
web
results.
4.6.3
Building
Blocks
The
following
section
describes
the
building
blocks
for
the
implementation
of
the
clustering
algorithm
used
in
ferret.
4.6.3.1
Vector
Space
model
Vector
Space
Model
(VSM)[18]
is
a
technique
of
information
retrieval
that
transforms
the
problem
of
comparing
textual
data
into
a
problem
of
comparing
algebraic
vectors
in
a
multidimensional
space.
Once
the
transformation
is
done,
linear
algebra
operations
are
used
to
calculate
similarities
among
the
original
documents.
Every
unique
term
(word)
from
the
collection
of
analyzed
documents
forms
a
separate
dimension
in
the
VSM
and
each
document
is
represented
by
a
vector
spanning
all
these
dimensions.
For
example,
if
vector
v
represents
document
j
in
a
k‐dimensional
space
,then
component
t
of
vector
v,
where
t
1
.
.
.
k,
represents
the
degree
of
the
relationship
between
document
j
and
a
term
corresponding
to
dimension
t.
This
relationship
is
best
expressed
as
a
t
X
d
matrix
A,
usually
named
a
term­document
matrix
,
where
t
is
the
number
of
unique
terms
and
d
is
the
number
of
documents.
Element
aij
of
matrix
A
is
therefore
a
numerical
representation
of
relationship
between
term
i
and
document
j.
There
are
many
methods
for
calculating
aij
,
commonly
referred
to
as
term
weighting
methods.
4.6.3.2
Calculating
Relevance
We
use
the
tf‐idf
method
for
calculating
the
term
weights.
The
tf–idf
weight
(term
frequency–inverse
document
frequency)
is
a
weight
often
used
in
information
retrieval
and
text
mining.
This
weight
is
a
statistical
measure
used
to
evaluate
how
important
a
word
is
to
a
document
in
a
collection
or
corpus.
The
importance
increases
proportionally
to
the
number
of
times
a
word
appears
in
the
document
but
is
offset
by
the
frequency
of
the
word
in
the
corpus.
4.6.3.3
Suffix
Arrays
Let
A
=
a1a2a3
.
.
.
an
be
a
sequence
of
objects.
Let
us
denote
by
Ai
a
suffix
of
A
starting
at
position
i
!
1
.
.
.
n,
such
as
Ai
=aiai+1ai+2
.
.
.
an.
An
empty
suffix
is
also
FERRET:
360O
SEARCH
15
16
FERRET:
360O
SEARCH
defined
for
every
A
as
An+1
=
#.
A
suffix
array[19]
is
an
ordered
array
of
all
suffixes
of
A.
Suffix
arrays
are
used
as
an
efficient
data
structure
for
verifying
whether
a
sequence
of
objects
B
is
a
substring
of
A.The
complexity
of
this
operation
is
O(P
+
logN),
a
suffix
array
can
be
builtin
O(NlogN).
4.6.3.5
Singular
value
Decomposition
An
algebraic
method
of
matrix
decomposition
called
Singular
Value
Decomposition[20]
is
used
for
discovering
the
orthogonal
basis
of
the
original
term‐
document
matrix.
This
basis
consists
of
orthogonal
vectors
that,
at
least
hypothetically,
correspond
to
topics
present
in
the
original
term‐document
matrix.
SVD
breaks
a
t
X
d
matrix
A
into
three
matrices
U,
∑
and
V
,
such
that
A
=
U∑
VT
.
U
is
a
t
X
t
orthogonal
matrix
whose
column
vectors
are
called
the
left
singular
vectors
of
A,
V
is
a
d
X
d
orthogonal
matrix
whose
column
vectors
are
called
the
right
singular
vectors
of
A,
and
∑
is
a
t
X
d
diagonal
matrix
having
the
singular
values
of
A
ordered
decreasingly
along
its
diagonal.
The
rank
rA
of
matrix
A
is
equal
to
the
number
of
its
non‐zero
singular
values.
The
first
rA
columns
of
U
form
an
orthogonal
basis
for
the
column
space
of
A—an
essential
fact
used
by
Lingo.
4.6.4
Lingo
Algorithm
At
the
very
high
level
lingo[21]
first
finds
frequent
phrases
from
the
input
documents,
hoping
they
are
the
most
informative
source
of
human‐readable
topic
descriptions.
Next,
by
performing
reduction
of
the
original
term‐document
matrix
using
SVD,
it
tries
to
discover
any
existing
latent
structure
of
diverse
topics
in
the
search
result.
Finally,
it
match
group
descriptions
with
the
extracted
topics
and
assign
relevant
documents
to
them.
4.6.4.1
Preprocessing
The
aim
of
the
preprocessing
phase
is
to
prune
from
the
input
all
characters
and
terms
that
can
possibly
affect
the
quality
of
group
descriptions.
Two
steps
are
performed:
text
filtering
removes
HTML
tags,
entities
and
non‐letter
characters
except
for
sentence
boundaries.
Next,
appropriate
stemming
and
stop
words
removal
end
the
preprocessing
phase.
4.6.4.Phrase
Extraction
We
define
frequent
phrases
as
recurring
ordered
sequences
of
terms
appearing
in
the
input
documents.
Intuitively,
when
writing
about
something,
we
usually
repeat
the
subject‐related
keywords
to
keep
a
reader’s
attention.
Obviously,
in
a
good
writing
style
it
is
common
to
use
synonymy
and
pronouns
and
thus
avoid
annoying
repetition.
To
be
a
candidate
for
a
cluster
label,
a
frequent
phrase
or
a
single
term
must:
• appear
in
the
input
documents
at
least
certain
number
of
times
(term
frequency
threshold),
• not
cross
sentence
boundaries,
• be
a
complete
phrase
,
16
FERRET:
360O
SEARCH
FERRET:
360O
SEARCH
17
• not
begin
nor
end
with
a
stop
word.
We
use
suffix
arrays
to
find
such
complete
phrases.
4.6.4.2
Cluster
Label
Induction
Once
frequent
phrases
(and
single
frequent
terms)
that
exceed
term
frequency
thresholds
are
known,
they
are
used
for
cluster
label
induction.
There
are
three
steps
to
this:
term‐document
matrix
building,
abstract
concept
discovery,
phrase
matching
and
label
pruning.
The
term‐document
matrix
is
constructed
out
of
single
terms
that
exceed
a
predefined
term
frequency
threshold.
Weight
of
each
term
is
calculated
using
the
standard
term
frequency,
inverse
document
frequency
(tfidf)
formula,
terms
appearing
in
document
titles
are
additionally
scaled
by
a
constant
factor.
In
abstract
concept
discovery,
Singular
Value
Decomposition
method
is
applied
to
the
term‐document
matrix
to
find
its
orthogonal
basis.Vectors
of
this
basis
(SVD’s
U
matrix)
represent
the
abstract
concepts
appearing
in
the
input
documents.
Phrase
matching
and
label
pruning
step,
where
group
descriptions
are
discovered,
relies
on
an
important
observation
that
both
abstract
concepts
and
frequent
phrases
are
expressed
in
the
same
vector
space—the
column
space
of
the
original
term‐
document
matrix
A.The
classic
cosine
distance
is
used
to
calculate
how
“close”
a
phrase
or
a
single
term
is
to
an
abstract
concept.
Let
us
denote
by
P
a
matrix
of
size
t
X
(p+t)
where
t
is
the
number
of
frequent
terms
and
p
is
the
number
of
frequent
phrases.
Having
the
P
matrix
and
the
i‐th
column
vector
of
the
SVD’s
U
matrix,
a
vector
mi
of
cosines
of
the
angles
between
the
i‐th
abstract
concept
vector
and
the
phrase
vectors
can
be
calculated.
mi
=
UI
T
P.
The
phrase
that
corresponds
to
the
maximum
component
of
the
mi
vector
should
be
selected
as
the
human‐readable
description
of
i‐th
abstract
concept.
4.6.4.2
Cluster
Content
Discovery
In
the
cluster
content
discovery
phase,
the
classic
Vector
Space
Model
is
used
to
assign
the
input
documents
to
the
cluster
labels
induced
in
the
previous
phase.
In
a
way,
we
re‐query
the
input
document
set
with
all
induced
cluster
labels.
The
assignment
process
resembles
document
retrieval
based
on
the
VSM
model.
Let
us
define
matrix
Q,
in
which
each
cluster
label
is
represented
as
a
column
vector.
Let
C
=
QTA,
where
A
is
the
original
term‐document
matrix
for
input
documents.
This
way,
element
cij
of
the
C
matrix
indicates
the
strength
of
membership
of
the
j‐th
document
to
the
i‐th
cluster.
A
document
is
added
to
a
cluster
if
cij
exceeds
the
some
threshold
yet
another
control
parameter
of
the
algorithm.
Documents
not
assigned
to
any
cluster
end
up
in
an
artificial
cluster
called
Others.
FERRET:
360O
SEARCH
17
18
FERRET:
360O
SEARCH
4.7
Presentation
Engine
This
module
is
responsible
for
displaying
and
painting
the
results
for
the
user
browser.
It
uses
the
Adapter
pattern
to
abstract
the
search
part
from
the
display
part.
4.7.1
Search
Results
Tab
creator
This
interface
creates
a
tab
and
each
type
of
tab
can
be
separated
into
a
different
class.
The
most
important
functions
are
written
in
the
base
class
and
whenever
a
tab
is
needed
to
be
different
a
simple
class
can
be
easily
written.
4.8
View
The
clustered
results
and
the
socially
relevant
search
results
are
then
showed
to
the
end
user
in
tabbed
format,
which
allows
the
user
to
easily
find
his
appropriate
content.
The
view
uses
Mootools[22],
which
is
an
opensource
Javascriptig
framework,
which
enables
it
to
be
browser
agnostic.
The
following
chart
shows
the
performance
comparison
of
mootools
with
other
java‐scripting
frameworks.
The
performance
alongwith
the
ease
of
use
makes
it
one
of
the
preferred
choices.
Figure
7
Performance
comparison
of
various
Java­Scripting
Frameworks
(source:Blog)
18
FERRET:
360O
SEARCH
FERRET:
360O
SEARCH
19
5.
Evaluation
Framework
Ferret
tries
to
use
many
old
and
some
new
ideas
to
combine
them
into
a
new
exciting
product.
Hence
evaluation
of
such
a
system
is
critical.
The
Evaluation
falls
under
Three
broad
categories
Social
Network
based
relevance
Performance
Comparison
to
other
contemporary
search
engines
5.1
Social
Network
Simulation
Ferret
needs
the
social
network
to
provide
information
about
a
user
and
his
friends
so
that
it
can
perform
and
maintain
social
relevance
search
results.
Though
it
has
a
facebook
engine
ready,
Facebook
authentication
system
requires
a
static
IP
or
a
URL
to
work
with.
Due
to
this
limitation
it
became
essential
to
simulate
the
social
network.
The
following
section
describes
a
simple
social
network
simulation
5.1.1
ER
Diagram
Ferret
presently
simulates
a
social
network
to
implement
Social
Search.
The
database
model
used
is
as
follows:
Figure
8
Database
Model
for
Social
Network
Table
ufl_userfriends:
Column
Name
Uid
fid
Description
Refers
to
usr_user.uid
Refers
to
usr_user.uid
Table
uhb_userhobbies:
Column
Name
Uid
hobbies
Description
Refers
to
usr_user.uid
User
hobby
name
Table
Usk_usersearchpage:
Column
Name
Uid
pageid
Description
Refers
to
usr_user.uid
Refers
to
puk_pagekeyword.pageid
FERRET:
360O
SEARCH
19
20
FERRET:
360O
SEARCH
Ferret
does
not
use
uhb_userhobbies
table
currently
in
simulation.
It
is
possible
to
consider
the
user’s
friends’
hobbies
when
recommending
social
search
results
to
the
user.
5.2
JMeter
Performance
of
a
search
engine
is
critical
and
JMeter
is
an
open
source
tool
that
can
simulate
multiple
clients
sending
post
request[23,
24].
It
can
also
load
test
the
application.
Ferret
was
tested
using
JMeter
and
various
performance
stats
were
collected.
This
section
provides
details
on
the
test
cases.
5.2.1
Test
Cases
First
screenshot
shows
the
Jmeter
Test
plan
setup
screen.
The
Testplan
is
called
Ferret
Testplan.
Screenshot
2
shows
the
type
of
parameter,
namely
the
search
query,
to
be
passed
and
type
of
HTTP
request
to
be
sent,
for
example
POST
or
GET.
Screenshot
3
shows
the
expected
amount
of
load
(number
of
users),
number
of
time
each
query
is
executed
and
the
gap
between
consecutive
queries.
20
FERRET:
360O
SEARCH
FERRET:
360O
SEARCH
21
5.3
Comparison
to
Other
search
engines
Search
engines
performance
has
an
important
component,
which
deals
with
the
quality
of
results
for
a
particular
query.
Such
an
evaluation
is
very
subjective.
To
compare
the
result
of
ferret
to
contemporary
search
engine
method
of
surveying
was
used.
FERRET:
360O
SEARCH
21
22
FERRET:
360O
SEARCH
6.
Testing
And
Results
6.1
Prototype
system
We
have
build
a
prototype
systems
for
the
demo
using
the
hardware
and
software
listed
in
the
following
sections.
6.1.1
Software
• Java
1.6
• Eclipse
IDE
• J2EE
1.4
• Apache
Tomcat
5.5
• MySQL
5.0
• Clustering
Algorithms
(Developed
by
us)
• Mootools
• Multibox
• MySQL
JDBC
Connector
• JUnit
4.4
• Open
Source
Web
/
REST
API’s
for
Google,
IMDB,
Facebook
etc.
6.1.2
Hardware
We
need
simple
commodity
hardware,
as
it
will
not
be
a
live
system,
but
a
proof
of
concept.
Currently
a
Desktop
PC
with
a
browser
and
internet
connectivity
would
suffice.
We
would
primarily
develop
on
our
laptops.
6.1.3
Operating
system
The
primary
development
and
test
platforms
would
be
• Windows
98/XP/Vista
• MacOSX
10.5.5
(Leopard)
Though
most
of
the
technologies
we
are
using
are
completely
portable
and
we
should
be
able
to
run
on
most
systems
that
support
JAVA.
6.2
Results
We
conducted
results
using
memcached
and
Tomcat.
For
every
search
engine
response
times
are
very
important.
Since
we
use
Google
as
our
search
provider
our
times
can
never
be
better
than
Google.
Each
Tab
is
separated
on
different
threads
and
page
is
created
parallely.
22
FERRET:
360O
SEARCH
FERRET:
360O
SEARCH
23
6.2.1
Without
Memcached
Time
in
Seconds
Response
Times
without
Memcached
35
30
25
20
15
10
5
0
1
2
3
4
5
Number
of
times
the
same
query
dired
(Representative)
Without
memcached,
the
same
query
takes
approximately
constant
response
times.
This
is
because
the
entire
result
set
is
constructed
for
the
same
query
al
over
again
for
every
request.
6.2.2
With
Memcached
Time
in
Seconds
Response
Times
with
Memcached
and
logged
on
mode
35
30
25
20
15
10
5
0
1
2
3
4
5
Number
of
times
the
same
query
dired
(Representative)
Memcached
improves
the
performance
but
a
small
amount
of
time
is
spent
as
the
social
results
are
never
cached.
But
since
they
are
stored
locally
on
Ferrets
own
database,
the
bottleneck
is
because
of
the
remote
servers
and
the
clustering
system.
FERRET:
360O
SEARCH
23
24
FERRET:
360O
SEARCH
Time
in
Seconds
Response
Times
with
Memcached
and
not
logged
on
mode
35
30
25
20
15
10
5
0
1
2
3
4
5
Number
of
times
the
same
query
dired
(Representative)
When
the
user
is
not
logged
in
the
complete
page
is
constructed
completely
using
the
cached
results.
The
Thread
pools
are
not
interrupted
and
thus
the
performance
is
very
high.
6.2.3
Load
Response
with
Memcached
Response
Times
with
Memcached
and
concurrent
users
Time
in
Seconds
20
15
10
5
0
1
2
3
4
5
Concurrent
users
*
4
The
above
graph
shows
the
response
time
of
Ferret
system
with
multiple
concurrent
users
search
for
the
same
query.
It
is
evident
that
we
need
a
server
or
a
host
of
servers
to
handle
multiple
concurrent
users.
24
FERRET:
360O
SEARCH
FERRET:
360O
SEARCH
25
7.
Future
Work
There
are
a
lot
of
changes
that
we
dream
of,
and
we
have
a
long
way
to
go.
This
serves
as
a
good
demo
tool,
but
not
a
final
product.
Following
are
some
things
we
have
planned
for
Ferret.
o Using
up
our
summer
vacation
to
build
on
it
o Notion
of
Social
Rank
o Adding
blogs,
forums,
reservations,
email
search
to
search
results
o Using
Digg
interface
to
re‐rank
sites
o Learning
better
categories
o And
the
list
goes
on...
8.
Conclusions
This
was
a
very
good
learning
experience.
One
of
the
most
important
things
we
learnt
was
how
to
develop
an
idea
and
get
a
working
prototype.
From
our
perspective,
there
are
two
navigation
paradigms
on
the
Web
–
Search
and
Browse.
Search
lets
you
find
specific
bits
of
information
quickly
or
navigate
to
sites
you
already
know.
Browse
gives
you
a
more
immersive
way
to
explore
a
topic
so
that
you
can
learn
more
about
something
or
discover
something
new.
Ferret
is
about
reinventing
Browse
just
as
Google
reinvented
Search.
9.
Bibliography
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Brin,
S.
and
L.
Page,
The
anatomy
of
a
large­scale
hypertextual
Web
search
engine.
Computer
Networks
and
ISDN
Systems,
1998.
30(1‐7):
p.
107‐117.
larry
page,
S.B.
Google.
Available
from:
http://www.google.com.
Wikipedia­The
free
Encyclopedia.
Available
from:
http://www.wikipedia.com.
Liad
Agmon,
A.y.,
Sagie
Davidovitch(co‐founders),
Delver.
Mislove,
A.,
K.
Gummadi,
and
P.
Druschel.
Exploiting
Social
Networks
for
Internet
Search.
2006.
Chen,
H.
and
S.
Dumais.
Bringing
order
to
the
Web:
automatically
categorizing
search
results.
2000:
ACM
Press
New
York,
NY,
USA.
Thet,
T.,
J.
Na,
and
C.
Khoo,
Automatic
Classification
of
Web
Search
Results:
Product
Review
vs.
Non­review
Documents.
LECTURE
NOTES
IN
COMPUTER
SCIENCE,
2007.
4822:
p.
65.
Vogel,
D.,
et
al.,
Classifying
search
engine
queries
using
the
web
as
background
knowledge.
SIGKDD
Explor.
Newsl.,
2005.
7(2):
p.
117‐122.
Yeung,
A.,
N.
Gibbins,
and
N.
Shadbolt,
A
k­Nearest­Neighbour
Method
for
Classifying
Web
Search
Results
with
Data
in
Folksonomies.
2008.
Venky
Harinarayan,
A.R.C.‐f.
Kosmix.
Available
from:
http://www.kosmix.com.
FERRET:
360O
SEARCH
25
26
FERRET:
360O
SEARCH
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
Read
Write
Web­
Top
10
Alternative
Search
Engines
of
2008.
Available
from:
http://www.readwriteweb.com/archives/top_10_alternative_search_engi.ph
ps.
Buschmann,
F.,
Pattern­oriented
software
architecture:
a
system
of
patterns.
2002:
Wiley.
Buschmann,
F.,
K.
Henney,
and
D.
Schmidt,
Pattern­oriented
software
architecture.
1996:
Wiley
New
York.
Fitzpatrick,
B.,
Distributed
caching
with
memcached.
Linux
Journal,
2004.
2004(124).
Interactive,
D.,
Memcached.
2006.
Bezos,
J.
A9­Amazons
Seach
Engine.
Available
from:
http://www.a9.com.
Needham,
C.
Internet
Movie
Database.
Available
from:
http://www.imdb.com.
Wong,
S.,
W.
Ziarko,
and
P.
Wong.
Generalized
vector
spaces
model
in
information
retrieval.
1985:
ACM
New
York,
NY,
USA.
Manber,
U.
and
G.
Myers.
Suffix
arrays:
A
new
method
for
on­line
string
searches.
1990:
Society
for
Industrial
and
Applied
Mathematics
Philadelphia,
PA,
USA.
Golub,
G.
and
C.
Reinsch,
Singular
value
decomposition
and
least
squares
solutions.
Numerische
Mathematik,
1970.
14(5):
p.
403‐420.
Osinski,
S.,
J.
Stefanowski,
and
D.
Weiss.
Lingo:
Search
results
clustering
algorithm
based
on
singular
value
decomposition.
2004:
Springer.
Proietti,
V.,
MooToolsÐthe
compact
javascript
framework.
Foundation,
A.,
Apache
JMeter.
Hansen,
K.,
Load
Testing
your
Applications
with
Apache
JMeter.
Java
Boutique
Internet,
http://javaboutique.
internet.
com/tutorials/JMeter/,
as
viewed
November,
2004.
26
FERRET:
360O
SEARCH
FERRET:
360O
SEARCH
27
9.
Appendix
Figure
9:
Home
Page
Searching
for
‘metallica’
in
Ferret…
Figure
10:
Search
Query
FERRET:
360O
SEARCH
27
28
FERRET:
360O
SEARCH
Search
(Web)
results
for
‘metallica’
in
Ferret.
Figure
11:
Search
Results
Page
>
Web
Search
Tab
Media
results
for
‘metallica’
in
Ferret.
Results
are
clustered
by
STC
and
Lingo
Algorithm
Figure
12:
Search
Results
Page:
Media
Results
Tab
28
FERRET:
360O
SEARCH
FERRET:
360O
SEARCH
29
Playing
‘Nothing
Else
Matters’
video.
Figure
13:
Search
Results
Page
>
Media
Results
Tab
>
Media
Player
Product
results
for
‘metallica’
in
Ferret.
Results
are
clustered
by
seed‐list
based
clustering.
Figure
14:
Search
Results
Page
>
Product
Search
FERRET:
360O
SEARCH
29
30
FERRET:
360O
SEARCH
Social
results
for
‘metallica’
in
Ferret.
No
results
shown
since
user
is
not
logged
in.
Figure
15:
Search
Results
Page
>
Social
Search
(Not
Logged
in)
User
logs
into
Ferret
to
see
social
search
results.
Figure
16:
Ajax
Login
Option
30
FERRET:
360O
SEARCH
FERRET:
360O
SEARCH
31
User’s
friend
has
already
searched
for
‘metallica’
and
his
favorite
‘metallica’
pages
are
displayed.
Figure
17:
Search
Results
Page
>
Social
Results
Tab
(Logged
in)
Search
(Web)
results
for
‘ipl’
in
Ferret.
User
‘ketan’
is
logged
in
and
clicks
on
a
URL.
Figure
18:
Search
Results
Page
>
Web
Search
>
On
Clicking
a
Query
FERRET:
360O
SEARCH
31
32
FERRET:
360O
SEARCH
User
‘ketan’
logs
out
and
‘praful’
logs
in
and
searches
for
ipl
again.
Figure
19:
When
a
Friend
Logs
in!
Social
results
for
‘ipl’
display
the
URL
ketan
had
liked
when
he
searched
for
ipl.
Figure
20:
The
Socially
Relevant
Query
turns
up
on
Friends
Page
32
FERRET:
360O
SEARCH
FERRET:
360O
SEARCH
33
Search
(Web)
results
for
‘yoyo’
in
Ferret.
Dictionary
Search
is
able
to
get
a
definition
for
‘yoyo’
Figure
21:
Wikipedia,
Wordnet
Dictionary
search,
Images
from
yahoo
and
google
FERRET:
360O
SEARCH
33

Download