Search Engine Optimization (SEO)

advertisement
Agenda
• What is a Search Engine?
• Examples of popular Search Engines
• Search Engines statistics
• Why is Search Engine marketing important?
• What is a SEO Algorithm?
• Steps to developing a good SEO strategy
• Ranking factors
• Basic tips for optimization
Examples popular Search Engines
How Do Search Engines Work?
Mechanics of
a typical
search
Results & ads
returned ranked
Category of
first result
Result for
phrase query
How Do Search Engines Work?
 Spider “crawls” the web to find new documents (web pages, other documents)
typically by following hyperlinks from websites already in their database
 Search engines indexes the content (text, code) in these documents by adding it to
their databases and then periodically updates this content
 Search engines search their own databases when a user enters in a search to find
related documents (not searching web pages in real-time)
 Search engines rank the resulting documents using an algorithm (mathematical
formula) by assigning various weights and ranking factors
Search on the Web
 Corpus: The publicly accessible Web: static + dynamic
 Goal: Retrieve high quality results relevant to the user’s need
 (not docs!)
 Need
Low hemoglobin
 Informational – want to learn about something
 Navigational – want to go to that page
United Airlines
 Transactional – want to do something (web-mediated)

Access a service

Downloads

Shop
Tampere weather
Mars surface images
Nikon CoolPix
 Gray areas


Find a good hub
Exploratory search “see what’s there”
Car rental Finland
Abortion morality
Search Engines as Info Gatekeepers
 Search engines are becoming
the primary entry point for discovering web
pages.
 Ranking of web pages
influences which pages users will view.
 Exclusion of a site from search engines
will cut off the site from its intended audience.
 The privacy policy of a search engine is important.
100+ Billion Searches / Month
Search Engine Wars
 The battle for domination of the web search space
is heating up!
 The competition is good news for users!
 Crucial:
advertising is combined with search results!
 What if one of the search engines
will manage to dominate the space?
 Synonymous with the dot-com boom, probably the best known brand on
the web.
 Started off as a web directory service in 1994,
acquired leading search engine technology in 2003.
 Has very strong advertising and e-commerce partners
Yahoo!
Lycos!
 One of the pioneers of the field
 Introduced innovations that
inspired the creation of Google
 Verb “google” has become synonymous with searching for information on the web.
 Has raised the bar on search quality
 Has been the most popular search engine in the last few years.
 Had a very successful IPO in August 2004.
 Is innovative and dynamic.
Google
Live Search
(was:
MSN Search)
 Synonymous with PC software.
 Remember its victory in the browser wars with Netscape.
 Developed its own search engine technology only recently, officially
launched in Feb. 2005.
 May link web search into its next version of Windows.
Important?
 80% of consumers find your website by first writing a query into a box on a
search engine (Google, Yahoo, Bing)
 90% choose a site listed on the first page
 85% of all traffic on the internet is referred to by search engines
 The top three organic positions receive 59% percent of user clicks.
 Cost-effective advertising
 Clear and measurable ROI
 Operates under this assumption:
More (relevant) traffic + Good Conversions Rate = More Sales/Leads
Experiment with query syntax
 Default is AND,
e.g. “computer chess” normally interpreted as
“computer AND chess”,
i.e. both keywords must be present in all hits.
 “+chess” in a query means
the user insists that “chess” be present in all hits.
 “computer OR chess” means
either keywords must be present in all hits.
 “”computer chess”” means that the phrase “computer
chess” must be present in all hits.
The most popular search keywords
AltaVista (1998) AlltheWeb (2002) Excite (2001)
sex
free
free
applet
sex
sex
porno
download
pictures
mp3
software
new
chat
uk
nude
Free Keyword Research Tools
– https://adwords.google.com/o/Targeting/Explorer?__c=10000000
00&__u=1000000000&__o=te&ideaRequestType=KEYWORD_IDE
AS#search.none
– Keyword Tool and Traffic Estimator to identify competitive
phrases and search frequencies
– http://www.google.com/insights/search
– Compare search patterns across specific regions, categories, time
frames and properties
Web search Users
 Ill-defined queries
 Short length
 Imprecise terms
 Sub-optimal syntax
(80% queries without operator)
 Low effort in defining queries
 Specific behavior
 85% look over
one result screen only
 mostly above the fold
 78% of queries are not
modified

 Wide variance in




Needs
Expectations
Knowledge
Bandwidth
1 query/session
 Follow links –
“the scent of information” ...
How far do people look for results?
Architecture of a Search Engine
Sponsored Links
CG Appliance Express
Discount Appliances (650) 756-3931
Same Day Certified Installation
www.cgappliance.com
San Francisco-Oakland-San Jose,
CA
User
Miele Vacuum Cleaners
Miele Vacuums- Complete Selection
Free Shipping!
www.vacuums.com
Miele Vacuum Cleaners
Miele-Free Air shipping!
All models. Helpful advice.
www.best-vacuum.com
Web
Results 1 - 10 of about 7,310,000 for miele. (0.12 seconds)
Miele, Inc -- Anything else is a compromise
Web spider
At the heart of your home, Appliances by Miele. ... USA. to miele.com. Residential Appliances.
Vacuum Cleaners. Dishwashers. Cooking Appliances. Steam Oven. Coffee System ...
www.miele.com/ - 20k - Cached - Similar pages
Miele
Welcome to Miele, the home of the very best appliances and kitchens in the world.
www.miele.co.uk/ - 3k - Cached - Similar pages
Miele - Deutscher Hersteller von Einbaugeräten, Hausgeräten ... - [ Translate this
page ]
Das Portal zum Thema Essen & Geniessen online unter www.zu-tisch.de. Miele weltweit
...ein Leben lang. ... Wählen Sie die Miele Vertretung Ihres Landes.
www.miele.de/ - 10k - Cached - Similar pages
Herzlich willkommen bei Miele Österreich - [ Translate this page ]
Herzlich willkommen bei Miele Österreich Wenn Sie nicht automatisch
weitergeleitet werden, klicken Sie bitte hier! HAUSHALTSGERÄTE ...
www.miele.at/ - 3k - Cached - Similar pages
Search
Indexer
The Web
Indexes
Ad indexes
25
Q: How does a
search engine
know that all
these pages
contain the query
terms?
A: Because all of
those pages have
been crawled
26
Sec. 20.2
Crawling picture
URLs crawled
and parsed
Seed
pages
Unseen Web
URLs frontier
Web
27
Motivation for crawlers
 Support universal search engines (Google, Yahoo,





MSN/Windows Live, Ask, etc.)
Vertical (specialized) search engines, e.g. news, shopping,
papers, recipes, reviews, etc.
Business intelligence: keep track of potential
competitors, partners
Monitor Web sites of interest
Evil: harvest emails for spamming, phishing…
… Can you think of some others?…
28
A crawler within a search engine
Web
Page
repository
googlebot
Query
Text & link
analysis
hits
Text index
PageRank
Ranker
29
One taxonomy of crawlers
Crawlers
Universal crawlers
Preferential crawlers
Focused crawlers
Topical crawlers
Adaptive topical crawlers
Evolutionary crawlers
etc...
Reinforcement learning crawlers
Static crawlers
Best-first
PageRank
etc...
 Many other criteria could be used:
 Incremental, Interactive, Concurrent, Etc.
30
Basic crawlers
 This is a sequential crawler
 Seeds can be any list of
starting URLs
 Order of page visits is
determined by frontier data
structure
 Stop criterion can be
anything
Graph traversal (BFS or DFS?)
 Breadth First Search
 Implemented with QUEUE (FIFO)
 Finds pages along shortest paths
 If we start with “good” pages, this keeps
us close; maybe other good stuff…
 Depth First Search
 Implemented with STACK (LIFO)
 Wander away (“lost in cyberspace”)
32
Universal crawlers
 Support universal search engines
 Large-scale
 Huge cost (network bandwidth) of crawl is
amortized over many queries from users
 Incremental updates to existing index and other
data repositories
33
Large-scale universal crawlers

Two major issues:
Performance
1.
•
Need to scale up to billions of pages
Policy
2.
•
Need to trade-off coverage, freshness, and bias
(e.g. toward “important” pages)
34
Large-scale crawlers: scalability
 Need to minimize overhead of DNS lookups
 Need to optimize utilization of network bandwidth and disk
throughput (I/O is bottleneck)
 Use asynchronous sockets
 Multi-processing or multi-threading do not scale up to billions of pages
 Non-blocking: hundreds of network connections open simultaneously
 Polling socket to monitor completion of network transfers
35
Universal crawlers: Policy
 Coverage
 New pages get added all the time
 Can the crawler find every page?
 Freshness
 Pages change over time, get removed, etc.
 How frequently can a crawler revisit ?
 Trade-off!
 Focus on most “important” pages (crawler bias)?
 “Importance” is subjective
36
Web coverage by search engine crawlers
100%
This assumes we know the size
of the entire the Web. Do we?
Can you define “the size of the
Web”?
90%
80%
70%
60%
50%
50%
40%
35%
34%
30%
20%
16%
10%
0%
1997
1998
1999
2000
Maintaining a “fresh” collection
 Universal crawlers are never “done”
 High variance in rate and amount of page changes
 HTTP headers are notoriously unreliable
 Last-modified
 Expires
 Solution
 Estimate the probability that a previously visited page has
changed in the meanwhile
 Prioritize by this probability estimate
38
Do we need to crawl the entire Web?
 If we cover too much, it will get stale
 There is an abundance of pages in the Web
 For PageRank, pages with very low prestige are largely
useless
 What is the goal?
 General search engines: pages with high prestige
 News portals: pages that change often
 Vertical portals: pages on some topic
 What are appropriate priority measures in these cases?
Approximations?
39
Sec. 20.1.1
Complications
 Web crawling isn’t feasible with one machine
 All of the above steps distributed
 Malicious pages
 Spam pages
 Spider traps – incl dynamically generated
 Even non-malicious pages pose challenges
 Latency/bandwidth to remote servers vary
 Webmasters’ stipulations

How “deep” should you crawl a site’s URL hierarchy?
 Site mirrors and duplicate pages
 Politeness – don’t hit a server too often
40
your guide for the search engines
41
What is robots.txt?
It’s a file in the root of your website that can either allow
or restrict search engine robots from crawling pages
on your website.
How does it work?
Before a search engine robot crawls your website, it will first look
for your robots.txt file to find out where you want them to go.
There are 3 things you should keep in mind:
 Robots can ignore your robots.txt. Malware robots scanning
the web for security vulnerabilities, or email address harvesters
used by spammers, will not care about your instructions.
 The robots.txt file is public. Anyone can see what areas of your
website you don’t want robots to see.
 Search engines can still index (but not crawl) a page you’ve
disallowed, if it’s linked to from another website. In the search
results it’ll then only show the url, but usually no title or
information snippet. Instead, make use of the robots meta tag
for that page.
What to put in your robots.txt file
 User-agent:
This is the line where you define which robot you’re talking to. It’s like
saying hello to the robot:
User-agent: * (Googlebot - Google, Slurp – Yahoo)
 Disallow:
This tells the robots what you don’t want them to crawl on your site:
Disallow: / (do not crawl anything on my site) /images/
 Allow
This tells the robots what you want them to crawl on your site.
Allow: /
What to put in your robots.txt file
 (Asterisk / wildcard *)
With the * symbol, you tell the robots to match any number of any characters.
Very useful for example when you don’t want your internal search result pages to
be indexed.
Disallow: *contact* (do not crawl any urls containing the word contact)
 $ (Dollar sign / ends with)
The dollar sign tells the robots that it is the end of the url.
Disallow: *.pdf$
 # (Hash / comme
You can add comments after the “#” symbol, either at the start of a line or after a
directive.
What to put in your robots.txt file
 Crawl-Delay
This directive asks the robot to wait a certain amount of seconds after each time
it’s crawled a page on your website..
Crawl-delay: 5
 Request-rate:
Here you tell the robot how many pages you want it to crawl within a certain
amount of seconds. The first number is pages, and the second number is seconds.
Request-rate: 1/5
# load 1 page per 5 seconds
 Visit-time:
It’s like opening hours, i.e. when you want the robots to visit your website. This
can be useful if you don’t want the robots to visit your website during busy hours
(when you have lots of human visitors).
Visit-time: 2100-0500
# only visit between 21:00 (9PM) and 05:00 (5AM) UTC (GMT)
Test your page
https://www.google.com/webmasters/
Search engine optimization
48
What is SEO?
 SEO = Search Engine Optimization
 Refers to the process of “optimizing” both the on-
page and off-page ranking factors in order to achieve
high search engine rankings for targeted search
terms.
 Refers to the “industry” that has been created
regarding using keyword searching a a means of
increasing relevant traffic to a website
What is a SEO Algorithm?
 Top Secret! Only select employees of a search engines company know for
certain
 Reverse engineering, research and experiments gives SEOs (search engine
optimization professionals) a “pretty good” idea of the major factors and
approximate weight assignments
 The SEO algorithm is constantly changed, tweaked & updated
 Websites and documents being searched are also constantly changing
 Varies by Search Engine – some give more weight to on-page factors, some
to link popularity
http://seositecheckup.com/
A good SEO strategy:
 Research desirable keywords and search phrases
(WordTracker, Overture, Google AdWords)
 Identify search phrases to target (should be relevant to business/market, obtainable
and profitable)
 “Clean” and optimize a website’s HTML code for appropriate keyword density, title
tag optimization, internal linking structure, headings and subheadings, etc.
 Help in writing copy to appeal to both search engines and actual website visitors
 Study competitors (competing websites) and search engines
 Implement a quality link building campaign
 Add Quality content
 Constant monitoring of rankings for targeted search terms
Ranking factors
 On-Page Factors (Code & Content)
#3 - Title tags <title>
#5 - Header tags <h1>
#4 - ALT image tags
#1 - Content, Content, Content (Body text) <body>
#6 - Hyperlink text
#2 - Keyword frequency & density
 Off-Page Factors
#1 Anchor text
#2 - Link Popularity (“votes” for your site) – adds
credibility
What a Search Engine Sees
 View > Source (HTML code)
Pay Per Click
 PPC ads appear as “sponsored listings”
 Companies bid on price they are willing to pay “per




click”
Typically have very good tracking tools and statistics
Ability to control ad text
Can set budgets and spending limits
Google AdWords and Overture are the two leaders
PPC vs. “Organic” SEO
Pay-Per-Click
• results in 1-2 days
• easier for a novice or one little knowledge of
SEO
• ability to turn on and off at any moment
• generally more costly per visitor and per
conversion
• fewer impressions and exposure
• easier to compete in highly competitive
market space (but it will cost you)
• Ability to generate exposure on related sites
(AdSense)
• ability to target “local” markets
• better for short-term and high-margin
campaigns
“Organic” SEO
• results take 2 weeks to 4 months
• requires ongoing learning and experience to
achieve results
• very difficult to control flow of traffic
• generally more cost-effective, does not
penalize for more traffic
• SERPs are more popular than sponsored ads
• very difficult to compete in highly competitive
market space
• ability to generate exposure on related
websites and directories
• more difficult to target local markets
• better for long-term and lower margin
campaigns
Keys to Successful SEO Strategy
1. Do not underestimate the
importance of keyword research
2. Be sure to include the proper tags
in your page coding
3. You must have optimized content!
(3-5 uses of keyword per 250 words)
4. Use content marketing
Keyword Selection
How much competition (large,
authority sites) is there for the
particular keyword?
Marketing/Brand
Relevance
Competition
How many people are searching
on the particular keyword?
RECOMMENDED
KEYWORDS
Search
Frequency
How closely does the keyword match
your product/service offering,
messaging, goals and objectives?
Optimization
Opportunity
Is there already a logical place on
the site to optimize for the
particular keyword?
Download