Yahoo OHDL BOSS

advertisement
Hack the BOSS
Ted DRAKE
Yahoo! France
BOSS Basics
“BOSS is a data API. It’s not a search API”
-Vik Singh, BOSS Architect
www2009 Conference, Madrid
2
BOSS = Freedom
•Change ranking
•Create your own look and feel
•Use your favorite ads
•Mash with external APIs
3
Coming Soon…
•SLA
•Customer Support
•Fees:
-Free for most uses
-Costs based on usage
4
BOSS Details
•
•
•
•
•
•
•
•
•
REST based API.
XML or JSON output
Web, News, Image, SiteSearch, and Spelling Suggestion
services
Time span filtering for News Search
Delicious Tags and Popularity
Keyterm extraction
Microformat and RDF data
Extended abstracts
Recognizes most search filters from Yahoo! and Google
(backdoor hacks)
5
What is the most important part of your application?
•
•
•
•
•
The results display?
The text ads?
The rounded borders?
The smooth animations?
The perfect URL?
THE QUERY STRING!!!
6
The Query
•
•
•
•
•
Tells you what the user is looking for
Generates related topics
Powers secondary APIs
Can be generated by a search box, URL, tags,or
keyword extraction from the page.
The Query is your BFF!
7
Let’s Start Hacking!
•Get an API key
•http://developer.yahoo.com
•You don’t need a URL for now.
•Update it later for better tracking and
promotion.
8
Site Specific Results
Search only one site:
/ysearch/web/v1/golf+site:vw.com?
Search from a select group of sites:
/ysearch/web/v1/golf?sites=vw.com,vwtrendsw
eb.com,performancevwmag.com,caranddriver.c
om
9
Tag or Title Filters
Use the inurl: filter to simulate tag search:
/ysearch/web/v1/inurl:golf?
Use intitle: to filter results with query in title
/ysearch/web/v1/intitle:golf?
10
Get Related Sites
Use related:foo.html to find related
sites
/ysearch/web/v1/related:http://www.cara
nddriver.com/car/2006-models/2006golf.html?
11
BOSS Keyterms
• Keyterms are words used to find a site while
searching on Yahoo!
• Listed in order of relevance.
• /web/v1/{query}?view=keyterms
12
Delicious Tags and Popularity
• How many times has a page been
saved in Delicious?
• What tags have been associated with
the page? How many times?
•view=delicious_saves,delicious_toptags
13
KeyTerms + Delicious Tags: What are they good for?
• Relevancy
• Related Searches
• Search Suggest
• Tag Clouds
• Trigger secondary APIs
• Highlight Popular Results
14
What it looks like
<keyterms>
<terms>
<term>Bucharest</term>
<term>city</term>
<term>Romanian</term>
<term>population</term>
<term>Romania</term>
<term>architecture</term>
<term>city centre</term>
<term>clubs</term>
</terms>
</keyterms>
15
BOSS Mashup Framework
•
•
•
•
Python based framework to mash BOSS API
with secondary web services and
proprietary data
Easy integration with Google APP Engine
Powers the infamous YUIL (4 hour search)
project.
Fast prototyping with minimal code
16
BOSSY Code on BOSS Mashup Platform
__author__ = "Vik Singh (viksi@yahoo-inc.com)"
from yos.util import text, typechecks
from yos.yql import db
from yos.boss import ysearch
def month_lookup(s):
for m in ["jan", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sept", "oct", "nov", "dec"]:
if s.startswith(m): return m
def parse_month(s):
months = filter(lambda m: m is not None, map(month_lookup, text.uniques(s)))
if len(months) > 0:
return text.norm(months[0]).capitalize()
def parse_year(s):
years = filter(lambda t: len(t) == 4 and typechecks.is_int(t), text.uniques(s))
if len(years) > 0: return text.norm(years[0])
17
Relevancy Hacking
Location Based Relevancy
•Where am I?
•Where am I going?
•What can I find?
Map generated by FirePin application on iPhone
19
Location Based Relevancy
• Fire Eagle: Standardized location and sharing platform
• Live location tracking
• Find upcoming traffic cameras, landmarks, restaurants, headlines,
photos, twitter buzz, etc…
• Shared locations with friends
• Mining Interesting Locations and Travel Sequences from GPS
Trajectories for Mobile Users by Yu Zheng, Lizhu Zhang, Xing Xie and Wei-Ying Ma
20
Secondary Sources
Wikipedia, Craigslist, Government Data…
1. Blah
2. Foo
3. Blah Blah
1. Foo
1. Baz
2. Bar
3. Foo
• Multiple sources to increase
relevance
• DuckDuckGo.com = BOSS +
Wikipedia (and other services)
• Understanding User's Query Intent
with Wikipedia by Jian Hu, gang
wang, Fred Lochovsky and Zheng
Chen - www2009 conference
•OpenData: DataMob.org,
TheInfo.org, InfoChimps.org
21
Real Time Events
• Tweet News: Twitter + News
Search
• Twitter users share most timely
articles
BOSS
• Relevancy highlights tweeted
stories
22
Internal + External Data Sources
• Tech Crunch Search: BOSS +
Access to proprietary data
BOSS
• Create custom tables in YQL
•BOSS “Vertical Lens” defines
what internal data BOSS should
index as well as your preferred
external sources.
23
Offline Analysis
Quic kT i me™ and a
T IFF (Unc ompres s ed) dec ompres s or
are needed t o s ee thi s pi c ture.
QuickTi me™ and a
TIFF ( Uncompressed) decompressor
are needed to see thi s pi ctur e.
QuickTi me™ and a
TIFF ( Uncompressed) decompressor
are needed to see thi s pi ctur e.
Quick Time™a nd a
TIFF ( Unco mpre ssed ) dec ompr esso r
ar e nee ded to see this pictur e.
QuickTi me™ and a
TIFF ( Uncompressed) decompressor
are needed to see thi s pi ctur e.
QuickTi me™ and a
TIFF ( Uncompressed) decompressor
are needed to see thi s pi ctur e.
Coloralo
• requests extra images
• caches them
• analyzes them for relevancy
Coloralo finds coloring book images.
24
Quick and Easy semantic Search
• Limit your results to sites with microformats or rdf data:
searchmonkeyid:com.yahoo.page.uf.hreview
• Request structured data, keyterms, and Delicious data from BOSS:
view=keyterms,searchmonkey_feed,searchmonkey_rdf,delicious_top
tags,delicious_saves
• Sample request:
http://boss.yahooapis.com/ysearch/web/v1/cocorosie+searchmonke
yid:com.yahoo.page.uf.hreview?appid=YourAppId&format=xml&start
=0&count=15&view=keyterms%2Csearchmonkey_feed%2Csearchmo
nkey_rdf%2Cdelicious_toptags
25
Inurl and Intitle Hacks
• Use your favorite search engine hacks with BOSS.
• Most of the SERP advanced search tricks will work with your BOSS
requests.
• This does not include Google, Yahoo!, or other specific patterns such
as !sports
26
Website Description
• Get a more complete picture of a target web site by combining
multiple requests
• Find the number of external sites linking to the site:
/ysearch/se_inlink/v1/{site}?omit_inlinks=domain
• Find the pages within the site:
/ysearch/se_pagedata/v1/{site}?
• Find related web pages:
/ysearch/web/v1/related:{site}?view=delicious_saves,delicious_topta
gs
27
Filter News by Time
• Older, less timely articles may have more natural relevancy. Control
this by selecting the age range for news articles.
• Use orderby=date to show latest instead of most relevant.
• What happened while you were asleep:
/ysearch/news/v1/{query}?age=9h&orderby=date
• Limit news articles to 1-7 days old:
/ysearch/news/v1/{query}?age=1d-7d
28
Vertical Focus
•Vertical Search Engines already have a niche
audience.
•Limit searches to appropriate sites:
InsiderFood
•Truevert creates a model of word relations in
context to its niche: environmental.
29
Go Beyond the Web Site
•Desktop: Xobni for Outlok
•Tools: Zemanta finds related information for
blogs and emails
•Modular: Create an application for Facebook,
Yahoo, MySpace and more with the Open
Social standard.
30
Go from Search to Action
•Keyword Finder uses BOSS keyterms to return
the top 10 keywords used by successful sites
for a query
•Bossy returns a single answer to questions.
Where is Big Ben? London.
31
Resources
• Yahoo! BOSS: http://developer.yahoo.com/boss
• BOSS Mashup Framework: http://developer.yahoo.com/search/boss/mashup.html
• YQL: http://developer.yahoo.com/yql
• Fire Eagle: http://developer.yahoo.com/fireeagle/
• Google App Engine: http://appengine.google.com
• Amazon Web Services: http://aws.amazon.com
• oAuth: http://oauth.net/
• Open Social: http://www.opensocial.org/
• Open Data: http://theinfo.org
• Alt Search Engines: http://www.altsearchengines.com/
• BOSS Hacks: http://bosshacks.com
- Add your hack to http://www.bosshacks.com/hacks/open-hack-day-london-2009
32
Download