Yahoo! Search Engine

Search Engine

Search Engine

 A Web search engine is a search engine designed to search for information on the World Wide Web.





Information may consist of web pages, images and other types of files.

Eg: Google.com, Yahoo!, Altavista.com, Excite.com

Search Engine

 Basically, a search engine is a software program that searches for sites based on the words that you designate as search terms.

 Search engines look through their own databases of information in order to find what it is that you are looking for.

Web Search Engine

 Web search engine is a tool designed to search for information on the World Wide

Web.

 The search results are usually presented in a list and are commonly called hits .

 The information may consist of web pages, images, information and other types of files..

Web Search Engine

 Some search engines also mine data available in databases or open directories.

 Unlike Web directories, which are maintained by human editors, search engines operate algorithmically or are a mixture of algorithmic and human input

Web Search Engine

 It is a progra Web Search Engine m that searches documents for specified keywords and returns a list of the documents where the keywords were found.

 Although search engine is really a general class of programs, the term is often used to specifically describe systems like Google, Alta

Vista and Excite that enable users to search for documents on the World Wide Web and

USENET newsgroups.

Web Search Engine

 Search engines and directories are not the same thing; although the term "search engine" often is used interchangeably.

Search engines automatically create web site listings by using spiders that "crawl" web pages, index their information, and optimally follows that site's links to other pages.

Web Search Engine

 Spiders return to already-crawled sites on a pretty regular basis in order to check for updates or changes, and everything that these spiders find goes into the search engine database.

How it works?

 Typically, a search engine works by sending out a spider to fetch as many documents as possible.

 Another program, called an indexer, then reads these documents and creates an index based on the words contained in each document.

 Each search engine uses a proprietary algorithm to create its indices such that, ideally, only meaningful results are returned for each query .

How it works?

 They include incredibly detailed processes and methodologies, and are updated all the time.

This is a bare bones look at how search engines work to retrieve your search results. All search engines go by this basic process when conducting search processes, but because there are differences in search engines, there are bound to be different results depending on which engine you use.

How it works?

 The searcher types a query into a search engine.

 Search engine software quickly sorts through literally millions of pages in its database to find matches to this query.

 The search engine's results are ranked in order of relevancy.

Example of Search Engines

 Google is always a safe bet for most search queries, and most of the time your search will be successful on the very first page of search results.

 Yahoo is also a great choice, and finds a lot of stuff that Google does not necessarily pick up.


 There are some search engines out there that are able to answer factual questions, among these are Answers.com

, BrainBoost ,

Factbites , and Ask Jeeves.

 There are quite a few search engines that will help you do this with clustered results or search suggestions. Some of these include Clusty ,

WiseNut , AOL Search , and Teoma, in addition to Gigablast , AllTheWeb , and SurfWax.


 There are lots of great search engines that deal primarily in academic and research oriented results. Included among there are

Scirus , Yahoo Reference , National

Geographic Map Machine , MagPortal ,

CompletePlanet , FirstGov , and

EducationWorld.


 Images on the Web are easy to find, especially with targeted image search engines such as Picsearch , Ditto , and of course, Google has some fantastic image search capabilities. You can also check out my list of Image Search Engines-

Directories-Collections , or Clip Art-

Buttons-Graphics- Icons-Images on the

Web .


 There's so much multimedia on the Web that your main problem will be finding enough time to look at it all. Here are a few places you can use to search for sounds, movies, and music on the Web:

Loomia , Torrent Typhoon , The Internet

Movie Database , SingingFish , and

Podscope.

For even more multimedia search engines and sites


 Finding someone with similar interests on the Web via a blog or online community is simple. Use LjSeek , Technorati , and

Daypop to search for blogs; find people with ZoomInfo , Pretrieve , or

Zabasearch , and search for discussion groups and message boards with

BoardTracker .

Operation of Search Engine

 A search engine operates, in the following order

 Web crawling

 Indexing

 Searching

Web Crawling

 A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner. Other terms for Web crawlers are ants , automatic indexers , bots , and worms or Web spider ,

Web robot , or —especially in the FOAF (an acronym of

Friend of a friend ) community — Web scutter .

 This process is called Web crawling or spidering . Many sites, in particular search engines, use spidering as a means of providing up-to-date data. Web crawlers are mainly used to create a copy of all the visited pages for later processing by a search engine that will index the downloaded pages to provide fast searches.

Web Crawling





Crawlers can also be used for automating maintenance tasks on a Web site, such as checking links or validating

HTML code. Also, crawlers can be used to gather specific types of information from Web pages, such as harvesting e-mail addresses (usually for spam).

A Web crawler is one type of bot , or software agent. In general, it starts with a list of URLs to visit, called the seeds . As the crawler visits these URLs, it identifies all the hyperlinks in the page and adds them to the list of

URLs to visit, called the crawl frontier . URLs from the frontier are recursively visited according to a set of policies.







Internet bots , also known as web robots , WWW robots or simply bots , are software applications that run automated tasks over the

Internet.

There are important characteristics of the Web that make crawling it very difficult:

 its large volume,

 its fast rate of change, and

 dynamic page generation.

The behavior of a Web crawler is the outcome of a combination of policies:

 a selection policy that states which pages to download,

 a re-visit policy that states when to check for changes to the pages,

 a politeness policy that states how to avoid overloading Web sites, and

 a parallelization policy that states how to coordinate distributed Web crawlers.

Indexing

 Search engine indexing collects, parses, and stores data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, physics and computer science. An alternate name for the process in the context of search engines designed to find web pages on the

Internet is Web indexing .

Indexing

 Popular engines focus on the full-text indexing of online, natural language documents.

 Media types such as video and audio and graphics are also searchable.

 Meta search engines reuse the indices of other services and do not store a local index, whereas cache-based search engines permanently store the index along with the corpus.

Search

 A web search query is a query that a user enters into web search engine to satisfy his or her information needs. Web search queries are distinctive in that they are unstructured and often ambiguous; they vary greatly from standard query languages which are governed by strict syntax rules.

GOOGLE

 "Googol" is the mathematical term for a 1 followed by 100 zeros. The term was coined by Milton Sirotta, nephew of American mathematician Edward Kasner, and was popularized in the book, "Mathematics and the Imagination" by Kasner and James

Newman.

PURPOSE

 The purpose of inventing Google's is to organize the world's information and make it universally accessible and useful

HISTORY

Type Public

Founded Menlo Park, California (September 7, 1998)

Headquarters Mountain View, California, USA

Key people

Eric E. Schmidt, CEO/Director

Sergey Brin, Co-Founder, Technology President

Larry Page, Co-Founder, Products President

George Reyes, CFO

Industry Internet, Computer software

Revenue US$16.593 billion ▲56% (2007)

Net income US$4.203 billion ▲25% (2007)

Employees 16,805 (December 31, 2007)

Slogan Don't be evil

Website http://www.google.com/

 Founded by Larry Page and Sergey Brin

 September 7, 1998

 Begun as a research project

 They hypothesized that a search engine that analyzed the relationships between websites would produce better ranking of results than existing techniques, which ranked results according to the number of times the search term appeared on a page

 Nicknamed – ‘Backrub’

 Originally, the search engine used the

Stanford University website with the domain google.stanford.edu

.

 The domain google.com

was registered on September 15, 1997 and the company was incorporated as Google

Inc.

on September 7, 1998 at a friend's garage in Menlo Park, California.

 Originally it was Googol.com but the name has already registered to another site.

Introduction





Google's founders Larry Page and Sergey Brin developed a new approach to online search that took root in a Stanford University dorm room and quickly spread to information seekers around the globe. Named for the mathematical term " googol ,"

Google operates websites at many international domains, with the most trafficked being

Google.com. Google is widely recognized as the world's best search engine because it is fast, accurate and easy to use. The company also serves corporate clients, including advertisers, content publishers and site managers with costeffective advertising and a wide range of revenuegenerating search services.

Google History

 According to Google lore, company founders Larry Page and Sergey Brin were not terribly fond of each other when they first met as Stanford

University graduate students in computer science in 1995. Larry was a 24year-old University of Michigan alumnus on a weekend visit; Sergey, 23, was among a group of students assigned to show him around. By January of 1996, Larry and Sergey had begun collaboration on a search engine called BackRub, named for its unique ability to analyze the "back links" pointing to a given website. A year later, their unique approach to link analysis was earning BackRub a growing reputation among those who had seen it. In September 1998, Google Inc. opened its door in Menlo Park,

California. Already Google.com, still in beta, was answering 10,000 search queries each day. The press began to take notice of the upstart website with the relevant search results, and articles extolling Google appeared in USA

TODAY and Le Monde. That December, PC Magazine named Google one of its Top 100 Web Sites and Search Engines for 1998. Google was moving up in the world.

Objectives and Goals











To push more development and understanding into the academic realm.

To build system that reasonable numbers of people can actually use. Usage are important to

Google because they think some of the most interesting research will involve leveraging the vast amount of usage data that is available from modern web systems.

To build an architecture that can support novel research activities on large-scale web data.

To set up an environment where other researches can come in quickly, process large chunks of the web, and produce interesting results that have been very difficult to produce otherwise.

To set up a Space-lab like environment where researches or even students can propose and do interesting experiments on our large-scale web data.

Features Overview

 The Google Toolbar enables you to conduct a Google search from anywhere on the web





Google AdWords program to promote their products and services on the web with targeted advertising, and they believe

AdWords is the largest program of its kind.

Google AdSense program to deliver ads relevant to the content on their sites, improving their ability to generate revenue and enhancing the experience for their users

Technology Overview

 PageRank Technology: PageRank reflects

Google's view of the importance of web pages by considering more than 500 million variables and 2 billion terms. Pages that Google believes are important pages receive a higher PageRank and are more likely to appear at the top of the search results.

PageRank also considers the importance of each page that casts a vote, as votes from some pages are considered to have greater value, thus giving the linked page greater value. Important pages receive a higher PageRank and appear at the top of the search results. Google's technology uses the collective intelligence of the web to determine a page's importance. There is no human involvement or manipulation of results, which is why users have come to trust Google as a source of objective information untainted by paid placement.

 Hypertext-Matching Analysis: Google's search engine also analyzes page content.

However, instead of simply scanning for page-based text (which can be manipulated by site publishers through meta-tags),

Google's technology analyzes the full content of a page and factors in fonts, subdivisions and the precise location of each word. Google also analyzes the content of neighboring web pages to ensure the results returned are the most relevant to a user's query .

Anchor Text

 The text of links is treated in a special way in Google search engine. Most search engines associate the text of a link with the page that the link is on. In addition, they associate it with the page the link points to. This has several advantages. First, anchors often provide more accurate descriptions of web pages than the pages themselves. Second, anchors may exist for documents which cannot be indexed by a text-based search engine, such as images, programs, and databases. This makes it possible to return web pages which have not actually been crawled.

System Anatomy

 BigFiles

BigFiles are virtual files spanning multiple file systems and are addressable by 64 bit integers. The allocation among multiple file systems is handled automatically. The

BigFiles package also handles allocation and deallocation of file descriptors, since the operating systems do not provide enough for our needs. BigFiles also support rudimentary compression options.



Repository

The repository contains the full HTML of every web page. Each page is compressed using zlib

(see RFC1950). The choice of compression technique is a tradeoff between speed and compression ratio. They chose zlib’s speed over a significant improvement in compression offered by bzip. The compression rate of bzip was approximately 4 to 1 on the repository as compared to zlib’s 3 to 1 compression. The repository requires no other data structures to be used in order to access it. This helps with data consistency and makes development much easier; they can rebuild all the other data structures from only the repository and a file which lists crawler errors.

 Document Index

The document index keeps information about each document. It is a fixed width ISAM (Index sequential access mode) index, ordered by docID. The information stored in each entry includes the current document status, a pointer into the repository, a document checksum, and various statistics. If the document has been crawled, it also contains a pointer into a variable width file called docinfo which contains its URL and title. Otherwise the pointer points into the

URLlist which contains just the URL. This design decision was driven by the desire to have a reasonably compact data structure, and the ability to fetch a record in one disk seeks during a search.



Lexicon

The lexicon has several different forms. One important change from earlier systems is that the lexicon can fit in memory for a reasonable price. In the current implementation Google can keep the lexicon in memory on a machine with 256 MB of main memory. The current lexicon contains 14 million words

(though some rare words were not added to the lexicon). It is implemented in two parts -a list of the words (concatenated together but separated by nulls) and a hash table of pointers.

 Hit List

A hit list corresponds to a list of occurrences of a particular word in a particular document including position, font, and capitalization information. Hit lists account for most of the space used in both the forward and the inverted indices.

Because of this, it is important to represent them as efficiently as possible.They chose a hand optimized compact encoding since it required far less space than the simple encoding and far less bit manipulation than

Huffman coding



Forward Index

The forward index is actually already partially sorted. It is stored in a number of barrels

(we used 64). Each barrel holds a range of wordID’s. If a document contains words that fall into a particular barrel, the docID is recorded into the barrel, followed by a list of wordID’s with hitlists which correspond to those words. This scheme requires slightly more storage because of duplicated docIDs but the difference is very small for a reasonable number of buckets and saves considerable time and coding complexity in the final indexing phase done by the sorter.

 Inverted Index

The inverted index consists of the same barrels as the forward index, except that they have been processed by the sorter. For every valid wordID, the lexicon contains a pointer into the barrel that wordID falls into. It points to a doclist of docID’s together with their corresponding hit lists. This doclist represents all the occurrences of that word in all documents.



Crawling The Web

In order to scale to hundreds of millions of web pages, Google has a fast distributed crawling system. A single URLserver serves lists of URLs to a number of crawlers (we typically ran about 3). Both the URLserver and the crawlers are implemented in Python.

Each crawler keeps roughly 300 connections open at once. This is necessary to retrieve web pages at a fast enough pace. At peak speeds, the system can crawl over 100 web pages per second using four crawlers.

Each crawler maintains its own DNS cache so it does not need to do a

DNS lookup before crawling each document. Each of the hundreds of connections can be in a number of different states: looking up DNS, connecting to host, sending request, and receiving response. These factors make the crawler a complex component of the system. It uses asynchronous IO to manage events, and a number of queues to move page fetches from state to state.



Indexing The Web

Parsing

Any parser which is designed to run on the entire Web must handle a huge array of possible errors. These range from typos in HTML tags to kilobytes of zeros in the middle of a tag, non-ASCII characters, HTML tags nested hundreds deep, and a great variety of other errors that challenge anyone’s imagination to come up with equally creative ones.

-

Indexing Documents into Barrels

After each document is parsed, it is encoded into a number of barrels. Every word is converted into a wordID by using an in-memory hash table -- the lexicon. New additions to the lexicon hash table are logged to a file. Once the words are converted into wordID’s, their occurrences in the current document are translated into hit lists and are written into the forward barrels.

- Sorting

In order to generate the inverted index, the sorter takes each of the forward barrels and sorts it by wordID to produce an inverted barrel for title and anchor hits and a full text inverted barrel. This process happens one barrel at a time, thus requiring little temporary storage.



• Searching

The goal of searching is to provide quality search results efficiently. made great progress in terms of efficiency. Therefore, they have focused more on quality of search in our research, although they believe their solutions are scalable to commercial volumes with a bit more effort. The Google query evaluation process is show in Figure 4.

 To put a limit on response time, once a certain number

(currently 40,000) of matching documents are found, the searcher automatically goes to step 8 in Figure 4.

This means that it is possible that sub-optimal results would be returned. They are currently investigating other ways to solve this problem. In the past, they sorted the hits according to PageRank, which seemed to improve the situation.

• Searching

• The Ranking System

 Google designed their ranking function so that no particular factor can have too much influence. For example a single word query. In order to rank a document with a single word query, Google looks at that document’s hit list for that word.

Google considers each hit to be one of several different types (title, anchor, URL, plain text large font, plain text small font, ...) each of which has its own type-weight.

 The type-weights make up a vector indexed by type. Google counts the number of hits of each type in the hit list. Then every count is converted into a count-weight. Countweights increase linearly with counts at first but quickly taper off so that more than a certain count will not help. They take the dot product of the vector of count-weights with the vector of type-weights to compute an IR score for the document. Finally, the

IR score is combined with PageRank to give a final rank to the document.

• Result and Performance







The results are clustered by server. This helps considerably when sifting through result sets.

Google relied on anchor text to determine this was a good answer to the query.

All of the results are reasonably high quality pages and, at last check, none were broken links.

This is largely because they all have high

PageRank

• Google strengths

 The interface is clear and simple.

 Pages load instantly.

 Placement in search results is never sold to anyone.

- No other search engine accesses more of the Internet or delivers more useful information than Google. Google Search is fast with most results coming back to the user in less than one second

• Google Weakness

 Some people love the results they get at

Google, others are often disappointed. To a large extent, both the pluses and the minuses derive from Google's ranking system, which (as the folks at Google explain http://www.google.com/technology/) depends largely on the number of links to a particular page and the relevance of the content on those linking pages to the content on the target page, and the quality of the pages doing the linking.

 If we want to know what Web pages outside of your own site have links to your pages. At Google, we can do a search for link:samizdat.com or get the same results by going to their

"Advanced" search and using their "page specific search" to find pages that link to a particular page. But the results are include the information that we don’t want and we don’t need

• Google Security and Product Safety

 As a provider of software, services and monetization for users, advertisers and publishers on the Internet,

Google feel a responsibility to protect your privacy and security. They recognize that secure products are instrumental in maintaining the trust you place in them and strive to create innovative products that both serve your needs and operate in your best interest.

Google takes security issues very seriously and will respond swiftly to fix verifiable security issues. Some of their products are complex and take time to update. When properly notified of legitimate issues, they will do their best to acknowledge your emailed report, assign resources to investigate the issue, and fix potential problems as quickly as possible.

Conclusion

 Google is designed to crawl and index the

Web efficiently and produce much more satisfying search result than existing systems. Google's technology uses the collective intelligence of the web to determine a page's importance. There is no human involvement or manipulation of results, which is why users have come to trust Google as a source of objective information untainted by paid placement

FEATURES

 Search - 4 elements - speed, accuracy, objectivity and ease of use

 Google examines billions of web pages to find the most relevant pages for any query and typically returns those results in less than half a second.

 Google Gadgets - Sidebar plugins

 Toolbar - to use Google search without visiting the Google homepage.







Pagerank - PageRank reflects Google's view of the importance of web pages

Hypertext Matching Analysis - analyzes page content to ensure the results returned are the most relevant to a user's query.

Google also pioneered the first wireless search technology for on-the-fly translation of HTML to formats optimized for WAP, i-mode, J-SKY, and

EZWeb

GOOGLE SERVICES

 Google Desktop - made it easier for people to find and share information on their own computers

 Google Chat -connected people through Gmail and Talk the first service to integrate email and instant messaging within a web browser

 Google Page Creator -made it even easier for anybody to design and create web pages quickly and simply

 Google Earth - This technology enables users to fly through space, zooming into specific locations they choose, and seeing the real world in sharp focus

GOOGLE’S FUNCTION

 The life span of a Google query normally lasts less than half a second, yet involves a number of different steps that must be completed before results can be delivered to a person seeking information.

3. The search results are returned to the user in a fraction of a second.

2.

The query travels to the doc servers, which actually retrieve the stored documents. Snippets are generated to describe each search result.

1. The web server sends the query to the index servers. The content inside the index servers is similar to the index in the back of a book - it tells which pages contain the words that match the query.

ADVANTAGES





Google is the BIGGEST search engine database in the world

PageRank™ often finds useful pages. It is one of the defaults that cannot be turned off in Google and is not for sale. It works on a unique combination of factors, some of which are:

 Popularity - based on the number of links to a page and the importance of the pages that link

 Importance - traffic, quality of links

 Word proximity and occurrence in results

 Google has many useful ways to limit searches

 Google offers special "fuzzy" searches that are useful to search synonyms, find definitions, find similar/related pages, and more

 The shortcuts & special Google databases can enhance certain types of research

 Google Books and Google Scholar have great potential for university-level research using the web.

SHORTCOMINGS

 Lots of "stop words" which you have to precede with a + to search or search in quotes.

 Full Boolean logic is not supported - OR, - for

"not," and AND (implied as default).

 Despite its default AND, Google sometimes returns pages that do not contain all of your terms. Google shows you these results because they are

"important" (rank high) in Google. The only way to know whether your terms are in a page or why the page was provided is to look at Google's cached copy.

Google Search Tips:















You can search for a phrase by using quotations ["like this"] or with a minus sign between words [like-this].

You can search by a date range by using two dots between the years

[2004..2007].

When searching with a question mark [?] at the end of your phrase, you will see sponsored Google Answer links, as well as definitions if available.

Google searches are not case sensitive.

By default Google will return results which include all of your search terms.

Google automatically searches for variations of your term, with variants of the term shown in yellow highlight.

Google lets you enter up to 32 words per search query.

Yahoo

Introduction

 Headquarters in Sunnyvale, California

 1 of the internet service provider leading in internet business around the world.

 Currently 500 million user globally visit the site monthly.

 More then 20 branches around the world and More then 20 different language world wide.

 40 types of popular awards since incorporated.



History

Founder David Filo & Jerry

Yang PhD candidates in

Electrical Engineering at

Stanford University’s.

 Hobby tracking of personal interest turn to biz when

100,000 user access it.

 Incorporated 1995.

 Publicly owned company

1996.

 1st went public on

NASDAQ in April of 1996.

Jerry Yang

David Filo

 Feb, 2000 Yahoo! announces fourth stock split.

 April, 2003 New Yahoo!

Search introduced.

 Today become competitive internet biz

Development













Organized into categories and sub categories

Categories identified, from Arts and Humanities to

Society and Culture. (and everything in between).

Focusing on classifying the data.

Inspire people to make a positive impact on their communities.

Investing on human effort.

human editors provide the brain power and intuition needed to classify the web's many offerings.

Development cont..









Through this human effort, Yahoo has become the de-facto Dewey Decimal System for categorizing web sites.

it probably adds more sites to the guide than it has in the past.

Yahoo automatically sends queries to its partner

AltaVista, should it fail to find a match within its own listings.

As Yahoo is not a search engine, it cannot add the same instant indexing service. But it competes with search engines.

Objective

 To maintain market share!

Goal



“The users are finding what they need," said Srinija Srinivasan Yahoo's

Ontological Yahoo, or Director of Surfing

 "Our primary goal is to satisfy the users, not the listers," Srinivasan said.

Strength

1.



Buying Binge

Excellence using user data over 133 million (2004)

2.



Deep Relations

Know better personalize searchers by their profile

3.

4.



Advertising

Advertiser prefer splashy, animated ads ~ relationship bolstered with major auto makers & entertainment giants



Commercial skew

65% result page ~ commerce related vs 27% by Google

Weaknesses





 no clear attempt to either target those users with advertising or extend any real value added services.

 They have made acquisitions such as MyBlogLog and Flickr. For what though?

Publisher’s do not have a chance to leverage

Yahoo’s size and revenue potential the way that

Google has empowered their users, through the contextual ad system.

Feb, 2004 launch by Reiterating it’s supports for biz model at “period inclusion” stirred up controversy

Interaction

 Yahoo automatically sends queries to its partner AltaVista, should it fail to find a match within its own listings.

 What This Privacy Policy Covers

 Yahoo! treats personal information that

Yahoo! collects and receives

 Yahoo! participates in the Safe Harbor program developed by the U.S. Department of

Commerce and the European Union

 In general..

 to customize the advertising and content you see, fulfill your requests for products and services, improve our services, contact you, conduct research, and provide anonymous reporting for internal and external clients.

 Children

 child under age 13 attempts to register with Yahoo!, to create a Yahoo! Family Account to obtain parental permission.

 Yahoo! does not ask a child under age 13 for more personal information, as a condition of participation, than is reasonably necessary to participate in a given activity or promotion.

 Information Sharing and Disclosure

 Yahoo! does not rent, sell, or share personal information with other people or non-affiliated companies except to provide products or services you've requested, when we have your permission, or under the following circumstances:

 The About system is covered by one or more of the following patents:

 U.S. Patent No. 5,918,010

 U.S. Patent No. 6,081,788

 U.S. Patent No. 6,157,926

 U.S. Patent No. 6,195,681

 U.S. Patent No. 6,226,648

 U.S. Patent No. 6,336,132

 Australian Patent No. 729,891

 Other Patents Pending.

Yahoo Search Tips:

•

•

•

•

•

•

By default Yahoo returns results that include all of your search terms

To exclude words use a minus sign [cat -tabby] would show all results about cats with no mention of tabby.

Yahoo search results also shows related searches, which are based on other searches by users with similar terms

To search for a map, use map [location]

To search for dictionary definitions use "define" [define harddrive]

To search a single domain use site [site:webopedia.com

DVD] would search Webopedia for the term DVD.

Bing

 Bing is a new search engine from Microsoft that was launched on May 28, 2009.

 Microsoft calls it a "Decision Engine," because it's designed to return search results in a format that organizes answers to address your needs.

 When you search on Bing, in addition to providing relevant search results, the search engine also shows a list of related searches on the left-hand side of the search engine results page ( SERP ).

 You can also access a quick link to see recent search history. Bing uses technology from a company called Powerset, which Microsoft acquired

BING







Bing launched with several features that are unique in the search market.

For example, when you mouse-over a Bing result a small pop-up provides additional information for that result, including a contact e-mail address if available.

The main search box features suggestions as you type, and Bing's travel search is touted as being the best on the net. Bing is expected to replace Microsoft Live

Search.



 Bing Search Tips:

You can search for feeds using feeds: before the query

 To search Bing without a background image use http://www.bing.com/?rb=0

 To turn the background image back on, use http://www.bing.com/?rb=1

 To change the number of search results returned per page, click "Extras" (on top-right of page) and select "Preferences". Under Web Settings / Results you can choose 10, 15, 30 or 50 results

Semantic Web

 The Semantic Web is a web of data. There is lots of data we all use every day, and it is not part of the web. I can see my bank statements on the web, and my photographs, and I can see my appointments in a calendar. But can I see my photos in a calendar to see what I was doing when I took them? Can I see bank statement lines in a calendar?

 Why not? Because we don't have a web of data.

Because data is controlled by applications, and each application keeps it to itself.

Semantic Web

 The Semantic Web is about two things. It is about common formats for integration and combination of data drawn from diverse sources, where on the original Web mainly concentrated on the interchange of documents. It is also about language for recording how the data relates to real world objects. That allows a person, or a machine, to start off in one database, and then move through an unending set of databases which are connected not by wires but by being about the same thing.

Semantic Web

 The word semantic stands for the meaning of.

 The semantic of something is the meaning of something.

 The Semantic Web = a Web with a meaning.

Semantic Web

 The Semantic Web is a web that is able to describe things in a way that computers can understand.

 The Beatles was a popular band from Liverpool.

 John Lennon was a member of the Beatles.

 "Hey Jude" was recorded by the Beatles.

 Sentences like the ones above can be understood by people. But how can they be understood by computers?

Semantic Web

 Statements are built with syntax rules. The syntax of a language defines the rules for building the language statements. But how can syntax become semantic?

 This is what the Semantic Web is all about. Describing things in a way that computers applications can understand it.

 The Semantic Web is not about links between web pages.

Semantic Web

 The Semantic Web describes the relationships between things (like A is a part of B and Y is a member of Z) and the properties of things

(like size, weight, age, and price)

Hypermedia

•Hypermedia is a term that has been around since the 1940's.

•It refers to information linked together in an easily accessible way.

•The Internet thrives on hypermedia and allows videos to be linked to graphic buttons or text and other content found to be accessible simply with a mouse click.

•Hypermedia is more a method for accessing available information, which is the end result

Hypermedia

 An example of hypermedia is hypertext links. When an

Internet user enters a search term in Google or Yahoo and clicks the search button to find results, the information is presented as hypertext links with a bit of text describing the link. This helps the web surfer decide if these links are relevant to them and if they are worth viewing. If the first link is something that would be useful based on the blurb provided, clicking on the hypermedia

— in this case a hypertext link — will take the web surfer to relevant information regarding their search.

Hypermedia

 A blurb is a brief piece of writing used in the advertising of a creative work. The classic example of a blurb is the quote smeared across the cover of a bestselling novel which reads something like “absolutely thrilling.” Blurbs are designed to drum up interest in the creative work, hopefully thereby increasing sales, and the hunt for blurbs is a perennial quest for many artists, especially for people who are just starting out in their field.

Hypermedia

 Hypermedia is used as a logical extension of the term hypertext in which graphics, audio, video, plain text and hyperlinks intertwine to create a generally non-linear medium of information. This contrasts with the broader term multimedia , which may be used to describe noninteractive linear presentations as well as hypermedia. It is also related to the field of

Electronic literature. A term first used in a 1965 article by Ted Nelson

Hypermedia

 The World Wide Web is a classic example of hypermedia, whereas a non-interactive cinema presentation is an example of standard multimedia due to the absence of hyperlinks.

 The first hypermedia work was, arguably, the

Aspen Movie Map. Atkinson's HyperCard popularized hypermedia writing, while a variety of literary hypertext and hypertext works, fiction and nonfiction, demonstrated the promise of links.

Hypermedia

 Most modern hypermedia is delivered via electronic pages from a variety of systems including Media players, web browsers, and stand-alone applications. Audio hypermedia is emerging with voice command devices and voice browsing.

Hypermedia

 Hypermedia may be developed a number of ways.

Any programming to Hypermedia can be used to write programs that link data from internal variables and nodes for external data files. Multimedia development software such as Adobe Flash, Adobe

Director, Macromedia Authorware, and MatchWare

Mediator may be used to create stand-alone hypermedia applications, with emphasis on entertainment content. Some database software such as Visual FoxPro and FileMaker Developer may be used to develop stand-alone hypermedia applications, with emphasis on educational and business content management.

Process of writing and reading using non-linear hypermedia.

Process of writing and reading using traditional linear media

Hypermedia and Human Memory

 Human memory is associative. We associate pieces of information with other information and create complex knowledge structures. We often remember information via association. That is a person starts with an idea which reminds them of a related idea or a concept which triggers another idea. The order in which a human associates an idea with another idea depends on the context under which the person wants information. That is a person can start with a common idea and can end up associating it to completely different sequences of ideas on different occasions.

Hypermedia and Human

Memory

 When writing, an author converts his knowledge which exists as a complex knowledge structure into an external representation. Physical media such as printed material and video tapes only allow us to represent information in an essentially linear manner. Thus the author has to go through a linearisation process to convert his knowledge to a linear representation. This is not natural. So the author will provide additional information, such as a table of contents and an index, to help the reader understand the overall information organisation.


Memory

 Hypermedia, using computer supported links, allows us to partially mimic writing and reading processes as they take place inside our brain.

We can create non linear information structures by associating chunks of information in different ways using links. Further we can use a combination of media consisting of text, images, video, sound and animation to enrich the representation of information.


Memory

 It is not necessary for an author to go through a linearisation process of the author’s knowledge when writing. Also the reader can have access to some of the information structures the author had when writing the information. This will help the readers to create their own representation of knowledge and to integrate it into existing knowledge structures.


Memory

 In addition to being able to access information through association, hypermedia applications are strengthened by a number of additional aspects. These include an ability to incorporate various media, interactivity, vast data sources, distributed data sources, and powerful search engines. These make hypermedia a very powerful tool to create, store, access and manipulate information.

Hypermedia Linking

 Hypermedia systems - indeed information in general - contains various types of relationships between elements of information. Examples of typical relationships include similarity in meaning or context ( Vannevar Bush relates to

Hypermedia ), similarity in logical sequence

( Chapter 3 follows Chapter 2 ) or temporal sequence ( Video 4 starts 5 seconds after Video

3 ), and containment ( Chapter 4 contains Section

4.2

).

Hypermedia Systems





Hypermedia allows these relationships to be instantiated as links which connect the various information elements, so that these links can be used to navigate within the information space. We can develop different taxonomies of links, in order to discuss and analyse how they are best utilised.

One possible taxonomy is based on the mechanics of the links. We can look at the number of sources and destinations for links (single-source single-destination, multiple-source single-destination, etc.) the directionality of links (unidirectional, bidirectional), and the anchoring mechanism (generic links, dynamic links, etc.).

Hypermedia Systems

 A more useful link taxonomy is based on the type of information relationships being represented. In particular we can divide relationships (and hence links) into those based on the organisation of the information space (structural links) and those related to the content of the information space (associative and referential links).

Structural Links:

 The information contained within the hypermedia application is typically organised in some suitable fashion.

This organisation is typically represented using structural links. We can group structural links together to create different types of application structures. If we look, for example, at a typical book, then this has both a linear structure (from the beginning of the book linearly to the end of the book) and usually a hierarchical structure (the book contains chapters, the chapters contain sections, the sections contain …). Typically in a hypermedia application we try to create and utilise appropriate structures.

Structural Links:

 These structures are important in that they provide a form for the information space, and hence allow the user to development an understanding of the scale of the information space, and their location within this space.

This is very important in helping the user navigate within the information space. Structural relationships do not however imply any semantic relationship between linked information. For example, a chapter in a book which follows another is structurally related, but may not contain any directly related information. This is the role of associative links.

Associative Links:

 An associative link is an instantiation of a semantic relationship between information elements. In other words, completely independently of the specific structure of the information, we have links based on the meaning of different information components. The most common example which most people would be familiar with is cross-referencing within books ("for more information on

X refer to Y "). It is these relationships - or rather the links which are a representation of the relationships - which provide the essence of hypermedia, and in many respects can be considered to be the defining characteristic.

Referential Links:

 A third type of link which is often defined (and is related to associative links) is a referential link. Rather than representing an association between two related concepts, a referential link provides a link between an item of information and an elaboration or explanation of that information. A simple example would be a link from a word to a definition of that word. One simple way of conceptualizing the difference between associative and referential links is that the items linked by an associative link can exist independently, but are conceptually related.

However the item at one end of a referential link exists because of the existence of the other item.

Hypermedia Model

Run-time layer: presentation of the hypertext, user interaction, dynamics

|

(Presentation specifications)

|

Storage layer: database containing network of nodes and links

|

(Anchoring)

|

Within component layer: the contents/structure of nodes.

Designing a Hypermedia

 Important questions in designing the hypermedia are:

 Converting linear text to hypertext

 Text format conversions

 Dividing the text into nodes

 Link structures, automatic generation of links

 Are nodes in a database or are they separate files on file system

 Client-server of standalone

Designing a Hypermedia

 Text indexing is a well know problem area and results from there can be used to study automatic generation of links. In principle, a document can be analysed semantically (with the help of AI), statistically or lexically

(by computing the occurences of words). Problems in semantic analysis are that natural language is not easy to understand by the computer. In lexical analysis problems are for example the conflation of words and regognition of phrases (Esim. matriisi, matriisin, matriisilla mutta ei jälki, jälkeen). Solutions:

 Conflation algorithm

 Stemming algorithm

 Stopword-list

Hypermedia Applications

 The Crossroads The Crossroads ACM Student

Magazine. SIGLINK Home Page ACM Special

Interest Group on Hypermedia WWW server.

Apple Computer: The Virtual Campus Apple

Computer Higher Education Home Page and

The Virtual Campus project. Amsterdam Tourist

Guide An example of WWW tourism. Similar presentation can be found at least from Paris.

WWW Virtual Library Mathematical index search

(CSC) SIGLINK Home Page HyperKalevala projekti 1993-1995

Hypermedia Systems

 Intermedia

 A well known hypermedia system is Intermedia developed at Brown Universitys Institute for Research in

Information and Scholarship (IRIS) between 1985 and

1990 (see for example [Haan, ACM Comm. Jan 1992]).

Intermedia is a multiuser hypermedia framework where hypermedia functionality is handled at system level.

Intermedia presents the user graphical file system browser and a set of applications that can handle text, graphics, timelines, animations and videodisc data.

Intermedia

 There is also a browser for link information, a set of linguistic tools and the ability to create and traverse links. Link information is isolated from the documents and are saved into separate database.

The start and end position of the link are called anchors.

World Wide Web

 World Wide Web (WWW) is a global hypermedia system on Internet. It can be described as wide-area hypermedia information retrieval initiative aiming to give universal access to a large universe of documents [Hug93]. It was originally developed in CERN for transforming research and ideas effectively throughout the organization [Hug93].

World Wide Web

 Through WWW it is possible to deliver hypertext, graphics, animation and sound between different computer environments.

To use WWW the user needs a browser, for example NCSA Mosaic and a set of viewers, that are used to display complex graphics, animation and sound. NCSA

Mosaic is currently available on X-

Windows, Windows and Macintosh.

NSCA Mosaic and Netscape

 The browser itself can read hypertext documents that are marked with HyperText Markup Language

(HTML). HTML is based on Standard Generalized

Markup Language (SGML), and contains all formatting and link information as ASCII text.

HTML documents can reside on different computers on Internet, and a document is referenced by URL (Universal Resource Locator).

URL is of the form http://computer.org.country/doc.html where computer.org.country is the name of the computer and doc.html is the search path to the document.


 In order to create a node for WWW, a

HTTP (Hypertext Transfer Protocol) server application is needed. A link in WWW document is always expressed as URL.

Links can be references to files in ftpservers, Gophers, HTTP-servers or

Usenet newsgroup.


 Netscape is a popular WWW browser developed by Netscape Communications

Corp. Netscape 1.1 supports some HTML

3.0 features (tables) and has interesting

API, that makes it possible to develop

Arena

 Arena is an experimental WWW browser developed in CERN. It supports HTML 3.0 and thus is able to display mathematical formulas and tables.

MathBrowser

 Recently, The Mathsoft company has announced MathBrowser, a WWW-browser that can display HTML and MathCAD documents. MathBrowser has a computational engine and interface similar than MathCAD, allowing the student to edit

MathCAD documents through the Internet.

MathBrowser is used to distribute a collection of Shaum's outline series in electronic form.

HyperCard

 HyperCard is hypermedia authoring software for Macintosh computers. It is based on a card-metafora. Hypercard application is called a stack or a collection of stacks. Each stack consists of cards and only one card is visible in a stack. A card is displayed in fixed size window.

Hypertext links can be programmed by creating buttons and writing a HyperTalk script for the button.

LinksWare

 LinksWare is a commercial hypermedia authoring software for Macintosh that can create hypertext links between text files created with different word processors.

LinksWare uses a set of translators to convert files to its own format (Claris

XTND system). This can make the opening of a file very slow.

LinksWare

 LinksWare can open files that contain mathematical text, but files may be formatted differently than in original document, especially formulae do not appear to have proper line heights. In addition, it can not create links to other applications. However, it can create links to Apple script command files that can open an application and execute commands for that application.

Hyper-G

 Hyper-G is the name of an hypermediaproject currently under development at the IICM. Like other hypermedia undertakings, Hyper-G will offer facilities to access a diversity of databases with very heterogeneous information (from textual data, to vector graphics and digitized pictures, courseware and software, digitized speech and sound, synthesized music and speech, and digitized movie-clips). Like other hypermedia-systems it will allow browsing, searching, hyperlinking, and annotation.

Future Directions of Hypermedia

 There is a trend that hypertext features start to appear in ordinary applications like word processors, spreadsheets etc. This is called hypertext functionality within an application. Good examples of this is

Microsoft Internet Assistant, MathBrowser and MatSyma. Eventually, this will lead to system software containing support for hypertext features, nodes, links and browsing.